SlideShare a Scribd company logo
1 of 38
Download to read offline
Stage Level Scheduling
Improving Big Data and
AI integration
Thomas Graves
Software Engineer at NVIDIA
Spark PMC
Agenda
§ Resource Scheduling
§ Stage Level Scheduling
§ Use Case Example
§ Demo
Resource Scheduling On Spark
Resource Scheduling
• Driver
• Cores
• Memory
• Accelerators (GPU/FPGA/etc)
• Executors
• Cores
• Memory (overhead, pyspark, heap, offheap)
• Accelerators (GPU/FPGA/etc)
• Tasks (requirements)
• CPUs
• Accelerators (GPU/FPGA/etc)
Resource Scheduling
• Tasks Per Executor
• Executor Resources / Task Requirements
• Configs
spark.driver.cores=1
spark.executor.cores=4
spark.task.cpus=1
spark.driver.memory=4g
spark.executor.memory=4g
spark.executor.memoryOverhead=2g
spark.driver.resource.gpu.amount=1
spark.driver.resource.gpu.discoveryScript=./getGpuResources.sh
spark.executor.resource.gpu.amount=1
spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh
spark.task.resource.gpu.amount=0.25
Stage Level Scheduling
Overview
Spark ETL Stage Spark ML Stage
NODE NODE
GPU
CPU
CPU
Stage Level Scheduling
• Stage level resource scheduling (SPARK-27495)
• Specify resource requirements per RDD operation
• Spark dynamically allocates containers to meet resource requirements
• Spark schedules tasks on appropriate containers
• Benefits
• Hardware utilization and cost
• Ease of programming
• Application no longer required split ETL and Deep Learning into separate
applications
• Pipeline simplification
Use Cases
• Beneficial any time the user wants to change container resources between
stages in a single Spark application
• ETL to Deep Learning
• Skewed data
• Data size large in certain stages
• Jobs that use caching, switch to higher memory containers during those
stages
Resources Supported
• Executor Resources
• Cores
• Heap Memory
• OffHeap Memory
• Pyspark Memory
• Memory Overhead
• Additional Resources (GPUs, etc)
• Task Resources
• CPUs
• Additional Resources (GPUs, etc)
Requirements
• Spark 3.1.1
• Dynamic Allocation with External Shuffle Service or Shuffle tracking
enabled
• YARN and Kubernetes
• RDD API only
• Scala, Java, Python
Implementation Details
• New container acquired with new ResourceProfile
• Does NOT try to fit into existing container with different ResourceProfile
(Future Enhancement)
• Unused containers idle timeout
• Default to one ResourceProfile per stage
• Config to allow multiple ResourceProfiles per stage
• Multiple profiles will be merged with simple max of each resource
YARN Implementation Details
• External Shuffle Service and Dynamic Allocation
• YARN Container Priority – ResourceProfile Id becomes container priority
• YARN lower numbers are higher priority
• Job Server type scenario that may come into affect
• GPU and FPGA predefined, other resources require additional
configurations
• Custom resources via spark.yarn.executor.resource.* only apply in default
profile – do not propogate because no way to override
• Discovery script must be accessible – sent with job submission
Kubernetes Implementation Details
• Requires shuffle tracking enabled
(spark.dynamicAllocaiton.shuffleTracking.enabled)
• May not idel timeout if have shuffle data on the node
• Result in more cluster resource used
• spark.dynamicAllocaiton.shuffleTracking.timeout
• Pod Template Behavior
• Resource in Pod Template only used in default profile
• Specify all resources needed in the ResourceProfile
UI Screen Shots
--executor-cores 2 --conf spark.executor.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=0.5
UI Screen Shots
API
> import org.apache.spark.resource.{ExecutorResourceRequests, ResourceProfileBuilder,
TaskResourceRequests}
> val rpb = new ResourceProfileBuilder()
> val ereq = new ExecutorResourceRequests()
> val treq = new TaskResourceRequests()
> ereq.cores(4).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus")
> treq.cpus(4).resource("gpu", 2)
> rpb.require(ereq)
> rpb.require(treq)
> val rp = rpb.build()
// use the ResourceProfile with the RDD
> val mlRdd = df.rdd.withResources(rp)
> mlRdd.mapPartitions { x =>
// feed data into ML and get result
}.collect()
UI Screen Shots
UI Screen Shots
API
> rpb
Profile executor resources: ArrayBuffer(memoryOverhead=name: memoryOverhead, amount:
2048, script: , vendor: , cores=name: cores, amount: 4, script: , vendor: , memory=name:
memory, amount: 6144, script: , vendor: , gpu=name: gpu, amount: 2, script: ./getGpus,
vendor: ), task resources: ArrayBuffer(cpus=name: cpus, amount: 4.0, gpu=name: gpu,
amount: 2.0)
> mlRdd.getResourceProfile
: org.apache.spark.resource.ResourceProfile = Profile: id = 1, executor resources:
memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name:
cores, amount: 4, script: , vendor: ,memory -> name: memory, amount: 6144, script: ,
vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus
-> name: cpus, amount: 4.0,gpu -> name: gpu, amount: 2.0
API - Mutable vs Immutable
> ereq.cores(2).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus")
> treq.cpus(1).resource("gpu", 1)
> rpb.require(ereq).require(treq)
> val rp = rpb.build()
> rp
: org.apache.spark.resource.ResourceProfile = Profile: id = 2, executor resources: memoryOverhead ->
name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor:
,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus,
vendor: , task resources: cpus -> name: cpus, amount: 1.0,gpu -> name: gpu, amount: 1.0
> treq.cpus(2).resource("gpu", 2)
> rpb.require(treq)
> val rpNew = rpb.build()
> rpNew
: org.apache.spark.resource.ResourceProfile = Profile: id = 3, executor resources: memoryOverhead ->
name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor:
,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus,
vendor: , task resources: cpus -> name: cpus, amount: 2.0,gpu -> name: gpu, amount: 2.0
Use Case Example
End to End Pipeline
ETL Using Rapids Accelerator For Spark
Rapids Accelerator For Spark
• Run Spark on a GPU to accelerate processing
• combines the power of the RAPIDS cuDF library and the scale of the Spark distributed
computing framework
• Spark SQL and DataFrames
• Requires Spark 3.0+
• No user code changes
• If operation not supported, run on CPU like normal
• built-in accelerated shuffle based on UCX that can be configured to
leverage GPU-to-GPU communication and RDMA capabilities’
ETL Technology Stack
Dask cuDF
cuDF, Pandas
Python
Cython
cuDF C++
CUDA Libraries
CUDA
Java
JNI bindings
Spark dataframes,
Scala, PySpark
Rapids Accelerator For Apache Spark (Plugin)
DISTRIBUTED SCALE-OUT SPARK APPLICATIONS
APACHE SPARK CORE
RAPIDS
Accelerator
for Spark
Spark SQL API DataFrame API Spark Shuffle
if gpu_enabled(operation, data_type)
call-out to RAPIDS
else
execute standard Spark operation
● Custom Implementation of Spark
Shuffle
● Optimized to use RDMA and GPU-
to-GPU direct communication
JNI bindings
Mapping From Java/Scala to C++
RAPIDS C++ Libraries UCX Libraries
CUDA
JNI bindings
Mapping From Java/Scala to C++
Spark SQL & Dataframe Compilation Flow
DataFrame
Logical Plan
Physical Plan
bar.groupBy(
col(”product_id”),
col(“ds”))
.agg(
maxcol(“price”)) -
min(col(“p(rice”)).alias(“range”))
SELECT product_id, ds,
max(price) – min(price) AS
range FROM bar GROUP BY
product_id, ds
QUERY
GPU
PHYSICAL
PLAN
GPU Physical Plan
RAPIDS SQL
Plugin
RDD[InternalRow]
RDD[ColumnarBatch]
NDS Query 38 Results
Entire query is GPU accelerated
CPU Cluster: Driver: 1 x m5dn.large;
Workers: 8 x m5dn.2xlarge
On-demand cluster cost (US West): $4.488/hr
GPU Cluster: Driver: 1 x m5dn.large;
Workers: 8 x g4dn.2xlarge
On-demand cluster cost (US West): $6.152/hr
163.0
53.2
0.0
40.0
80.0
120.0
160.0
200.0
CPU: 8 x m5dn.2xlarge
(64-core 256GB)
GPU: 8 x g4dn.2xlarge
(64-core 256GB 8xT4
GPU)
Time
(secs)
Query Time
$0.20
$0.09
$0.00
$0.05
$0.10
$0.15
$0.20
$0.25
CPU: 8 x m5dn.2xlarge
(64-core 256GB)
GPU: 8 x g4dn.2xlarge
(64-core 256GB 8xT4 GPU)
Total Costs
3X Speed-up 55% Cost Saving
Deep Learning
Horovod Introduction
• Distributed Deep learning training framework
• TensorFlow, Keras, PyTorch, Apache MXNet
• High Performance features
• NCCL< GpuDirect, RDMA, tensor fusion
• Easy to use
• Just 5 lines of Python
• Open Source
• Linux Foundation AI Foundation
• Easy to install
• pip install horovod
horovod.ai
Demo
End to End Horovod Demo
Future Enhancements
Future Enhancements
• Collect feedback from users
• Allow setting certain configs – like dynamic allocation
• Fitting new ResourceProfiles into existing containers
• Better cleanup of ResourceProfiles
• Catalyst internally
Other Performance Enhancements
Other Enhancements
• Pluggable Caching
• Allows developers to try different caching solutions
• Custom GPU implementation
Questions
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Chris Fregly
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 

What's hot (20)

Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 

Similar to Stage Level Scheduling Improving Big Data and AI Integration

Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Databricks
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfChester Chen
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...Holden Karau
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on YarnQubole
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkDatabricks
 

Similar to Stage Level Scheduling Improving Big Data and AI Integration (20)

Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Recently uploaded (20)

Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Stage Level Scheduling Improving Big Data and AI Integration

  • 1. Stage Level Scheduling Improving Big Data and AI integration Thomas Graves Software Engineer at NVIDIA Spark PMC
  • 2. Agenda § Resource Scheduling § Stage Level Scheduling § Use Case Example § Demo
  • 4. Resource Scheduling • Driver • Cores • Memory • Accelerators (GPU/FPGA/etc) • Executors • Cores • Memory (overhead, pyspark, heap, offheap) • Accelerators (GPU/FPGA/etc) • Tasks (requirements) • CPUs • Accelerators (GPU/FPGA/etc)
  • 5. Resource Scheduling • Tasks Per Executor • Executor Resources / Task Requirements • Configs spark.driver.cores=1 spark.executor.cores=4 spark.task.cpus=1 spark.driver.memory=4g spark.executor.memory=4g spark.executor.memoryOverhead=2g spark.driver.resource.gpu.amount=1 spark.driver.resource.gpu.discoveryScript=./getGpuResources.sh spark.executor.resource.gpu.amount=1 spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh spark.task.resource.gpu.amount=0.25
  • 7. Overview Spark ETL Stage Spark ML Stage NODE NODE GPU CPU CPU
  • 8. Stage Level Scheduling • Stage level resource scheduling (SPARK-27495) • Specify resource requirements per RDD operation • Spark dynamically allocates containers to meet resource requirements • Spark schedules tasks on appropriate containers • Benefits • Hardware utilization and cost • Ease of programming • Application no longer required split ETL and Deep Learning into separate applications • Pipeline simplification
  • 9. Use Cases • Beneficial any time the user wants to change container resources between stages in a single Spark application • ETL to Deep Learning • Skewed data • Data size large in certain stages • Jobs that use caching, switch to higher memory containers during those stages
  • 10. Resources Supported • Executor Resources • Cores • Heap Memory • OffHeap Memory • Pyspark Memory • Memory Overhead • Additional Resources (GPUs, etc) • Task Resources • CPUs • Additional Resources (GPUs, etc)
  • 11. Requirements • Spark 3.1.1 • Dynamic Allocation with External Shuffle Service or Shuffle tracking enabled • YARN and Kubernetes • RDD API only • Scala, Java, Python
  • 12. Implementation Details • New container acquired with new ResourceProfile • Does NOT try to fit into existing container with different ResourceProfile (Future Enhancement) • Unused containers idle timeout • Default to one ResourceProfile per stage • Config to allow multiple ResourceProfiles per stage • Multiple profiles will be merged with simple max of each resource
  • 13. YARN Implementation Details • External Shuffle Service and Dynamic Allocation • YARN Container Priority – ResourceProfile Id becomes container priority • YARN lower numbers are higher priority • Job Server type scenario that may come into affect • GPU and FPGA predefined, other resources require additional configurations • Custom resources via spark.yarn.executor.resource.* only apply in default profile – do not propogate because no way to override • Discovery script must be accessible – sent with job submission
  • 14. Kubernetes Implementation Details • Requires shuffle tracking enabled (spark.dynamicAllocaiton.shuffleTracking.enabled) • May not idel timeout if have shuffle data on the node • Result in more cluster resource used • spark.dynamicAllocaiton.shuffleTracking.timeout • Pod Template Behavior • Resource in Pod Template only used in default profile • Specify all resources needed in the ResourceProfile
  • 15. UI Screen Shots --executor-cores 2 --conf spark.executor.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=0.5
  • 17. API > import org.apache.spark.resource.{ExecutorResourceRequests, ResourceProfileBuilder, TaskResourceRequests} > val rpb = new ResourceProfileBuilder() > val ereq = new ExecutorResourceRequests() > val treq = new TaskResourceRequests() > ereq.cores(4).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus") > treq.cpus(4).resource("gpu", 2) > rpb.require(ereq) > rpb.require(treq) > val rp = rpb.build() // use the ResourceProfile with the RDD > val mlRdd = df.rdd.withResources(rp) > mlRdd.mapPartitions { x => // feed data into ML and get result }.collect()
  • 20. API > rpb Profile executor resources: ArrayBuffer(memoryOverhead=name: memoryOverhead, amount: 2048, script: , vendor: , cores=name: cores, amount: 4, script: , vendor: , memory=name: memory, amount: 6144, script: , vendor: , gpu=name: gpu, amount: 2, script: ./getGpus, vendor: ), task resources: ArrayBuffer(cpus=name: cpus, amount: 4.0, gpu=name: gpu, amount: 2.0) > mlRdd.getResourceProfile : org.apache.spark.resource.ResourceProfile = Profile: id = 1, executor resources: memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 4, script: , vendor: ,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus -> name: cpus, amount: 4.0,gpu -> name: gpu, amount: 2.0
  • 21. API - Mutable vs Immutable > ereq.cores(2).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus") > treq.cpus(1).resource("gpu", 1) > rpb.require(ereq).require(treq) > val rp = rpb.build() > rp : org.apache.spark.resource.ResourceProfile = Profile: id = 2, executor resources: memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor: ,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus -> name: cpus, amount: 1.0,gpu -> name: gpu, amount: 1.0 > treq.cpus(2).resource("gpu", 2) > rpb.require(treq) > val rpNew = rpb.build() > rpNew : org.apache.spark.resource.ResourceProfile = Profile: id = 3, executor resources: memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor: ,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus -> name: cpus, amount: 2.0,gpu -> name: gpu, amount: 2.0
  • 22. Use Case Example End to End Pipeline
  • 23. ETL Using Rapids Accelerator For Spark
  • 24. Rapids Accelerator For Spark • Run Spark on a GPU to accelerate processing • combines the power of the RAPIDS cuDF library and the scale of the Spark distributed computing framework • Spark SQL and DataFrames • Requires Spark 3.0+ • No user code changes • If operation not supported, run on CPU like normal • built-in accelerated shuffle based on UCX that can be configured to leverage GPU-to-GPU communication and RDMA capabilities’
  • 25. ETL Technology Stack Dask cuDF cuDF, Pandas Python Cython cuDF C++ CUDA Libraries CUDA Java JNI bindings Spark dataframes, Scala, PySpark
  • 26. Rapids Accelerator For Apache Spark (Plugin) DISTRIBUTED SCALE-OUT SPARK APPLICATIONS APACHE SPARK CORE RAPIDS Accelerator for Spark Spark SQL API DataFrame API Spark Shuffle if gpu_enabled(operation, data_type) call-out to RAPIDS else execute standard Spark operation ● Custom Implementation of Spark Shuffle ● Optimized to use RDMA and GPU- to-GPU direct communication JNI bindings Mapping From Java/Scala to C++ RAPIDS C++ Libraries UCX Libraries CUDA JNI bindings Mapping From Java/Scala to C++
  • 27. Spark SQL & Dataframe Compilation Flow DataFrame Logical Plan Physical Plan bar.groupBy( col(”product_id”), col(“ds”)) .agg( maxcol(“price”)) - min(col(“p(rice”)).alias(“range”)) SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds QUERY GPU PHYSICAL PLAN GPU Physical Plan RAPIDS SQL Plugin RDD[InternalRow] RDD[ColumnarBatch]
  • 28. NDS Query 38 Results Entire query is GPU accelerated CPU Cluster: Driver: 1 x m5dn.large; Workers: 8 x m5dn.2xlarge On-demand cluster cost (US West): $4.488/hr GPU Cluster: Driver: 1 x m5dn.large; Workers: 8 x g4dn.2xlarge On-demand cluster cost (US West): $6.152/hr 163.0 53.2 0.0 40.0 80.0 120.0 160.0 200.0 CPU: 8 x m5dn.2xlarge (64-core 256GB) GPU: 8 x g4dn.2xlarge (64-core 256GB 8xT4 GPU) Time (secs) Query Time $0.20 $0.09 $0.00 $0.05 $0.10 $0.15 $0.20 $0.25 CPU: 8 x m5dn.2xlarge (64-core 256GB) GPU: 8 x g4dn.2xlarge (64-core 256GB 8xT4 GPU) Total Costs 3X Speed-up 55% Cost Saving
  • 30. Horovod Introduction • Distributed Deep learning training framework • TensorFlow, Keras, PyTorch, Apache MXNet • High Performance features • NCCL< GpuDirect, RDMA, tensor fusion • Easy to use • Just 5 lines of Python • Open Source • Linux Foundation AI Foundation • Easy to install • pip install horovod horovod.ai
  • 31. Demo
  • 32. End to End Horovod Demo
  • 34. Future Enhancements • Collect feedback from users • Allow setting certain configs – like dynamic allocation • Fitting new ResourceProfiles into existing containers • Better cleanup of ResourceProfiles • Catalyst internally
  • 36. Other Enhancements • Pluggable Caching • Allows developers to try different caching solutions • Custom GPU implementation
  • 38. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.