Suche senden
Hochladen
Getting Apache Spark Customers to Production
•
Als PPTX, PDF herunterladen
•
14 gefällt mir
•
1,622 views
Cloudera, Inc.
Folgen
from Kostas Sakellis
Weniger lesen
Mehr lesen
Software
Melden
Teilen
Melden
Teilen
1 von 34
Jetzt herunterladen
Empfohlen
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
shareddatamsft
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
Cloudera, Inc.
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
Empfohlen
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
shareddatamsft
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
InMobi Technology
5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
Cloudera, Inc.
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Mary Kypreos
Apache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
Hadoop on Docker
Hadoop on Docker
Rakesh Saha
Intro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
Road to Cloudera certification
Road to Cloudera certification
Cloudera, Inc.
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
DataWorks Summit
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
Yahoo Developer Network
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
Apache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
Vinayak Agrawal
Weitere ähnliche Inhalte
Was ist angesagt?
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Mary Kypreos
Apache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
Hadoop on Docker
Hadoop on Docker
Rakesh Saha
Intro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
Road to Cloudera certification
Road to Cloudera certification
Cloudera, Inc.
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
DataWorks Summit
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
Yahoo Developer Network
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Was ist angesagt?
(20)
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Spark in YARN-managed Multi-tenant Clusters by Pravin Mittal and Rajesh Iyer
Apache Accumulo Overview
Apache Accumulo Overview
Security implementation on hadoop
Security implementation on hadoop
Hadoop on Docker
Hadoop on Docker
Intro to Apache Spark
Intro to Apache Spark
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Road to Cloudera certification
Road to Cloudera certification
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Andere mochten auch
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
Apache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
Vinayak Agrawal
Cancer Outlier Profile Analysis using Apache Spark
Cancer Outlier Profile Analysis using Apache Spark
Mahmoud Parsian
How Totango uses Apache Spark
How Totango uses Apache Spark
Oren Raboy
Kodu Game Lab e Project Spark
Kodu Game Lab e Project Spark
Fabrício Catae
Fighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
Miklos Christine
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
Lambda Architectures in Practice
Lambda Architectures in Practice
C4Media
Running Spark in Production
Running Spark in Production
DataWorks Summit/Hadoop Summit
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
Real Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and Spark
QAware GmbH
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Tony Ng
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thessaloniki
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
How to deploy Apache Spark to Mesos/DCOS
How to deploy Apache Spark to Mesos/DCOS
Legacy Typesafe (now Lightbend)
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
Andere mochten auch
(20)
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Apache Spark Use case for Education Industry
Apache Spark Use case for Education Industry
Cancer Outlier Profile Analysis using Apache Spark
Cancer Outlier Profile Analysis using Apache Spark
How Totango uses Apache Spark
How Totango uses Apache Spark
Kodu Game Lab e Project Spark
Kodu Game Lab e Project Spark
Fighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
Lambda Architectures in Practice
Lambda Architectures in Practice
Running Spark in Production
Running Spark in Production
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real Time BOM Explosions with Apache Solr and Spark
Real Time BOM Explosions with Apache Solr and Spark
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Apache Spark Model Deployment
Apache Spark Model Deployment
How to deploy Apache Spark to Mesos/DCOS
How to deploy Apache Spark to Mesos/DCOS
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Ähnlich wie Getting Apache Spark Customers to Production
Apache Spark Operations
Apache Spark Operations
Cloudera, Inc.
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
Empower Hive with Spark
Empower Hive with Spark
DataWorks Summit
Yarns About Yarn
Yarns About Yarn
Cloudera, Inc.
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Jeremy Beard
Spark etl
Spark etl
Imran Rashid
Kafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
YARN
YARN
Alex Moundalexis
The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)
Simon Haslam
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
makker_nl
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
lee tracie
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
Cloudera, Inc.
Elastic build environment
Elastic build environment
Cachet Software Solutions Ltd
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
AWS實際架構實踐演化與解決方案
AWS實際架構實踐演化與解決方案
CKmates
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
Camel Camel
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
CKmates
Ähnlich wie Getting Apache Spark Customers to Production
(20)
Apache Spark Operations
Apache Spark Operations
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
Empower Hive with Spark
Empower Hive with Spark
Yarns About Yarn
Yarns About Yarn
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Spark etl
Spark etl
Kafka for DBAs
Kafka for DBAs
YARN
YARN
The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
Elastic build environment
Elastic build environment
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
AWS實際架構實踐演化與解決方案
AWS實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
實際架構實踐演化與解決方案
Mehr von Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
Mehr von Cloudera, Inc.
(20)
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Kürzlich hochgeladen
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
SelfMade bd
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
Shrmpro
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
Jim McKeeth
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
Presentation.STUDIO
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
masabamasaba
Define the academic and professional writing..pdf
Define the academic and professional writing..pdf
PearlKirahMaeRagusta1
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
masabamasaba
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
masabamasaba
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
masabamasaba
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
masabamasaba
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
The title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
panagenda
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
Kürzlich hochgeladen
(20)
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
Define the academic and professional writing..pdf
Define the academic and professional writing..pdf
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The title is not connected to what is inside
The title is not connected to what is inside
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
Getting Apache Spark Customers to Production
1.
1© Cloudera, Inc.
All rights reserved. Getting Spark Customers to Production Kostas Sakellis
2.
2© Cloudera, Inc.
All rights reserved. Me • Software Engineer at Cloudera • Contributor to Apache Spark • Before that, contributed to Cloudera Manager
3.
3© Cloudera, Inc.
All rights reserved. Our customers • Various degrees of sophistication with Spark • In all stages of development • From POC to production deployments • 95% use Spark on YARN* • Biweekly analysis of tickets
4.
4© Cloudera, Inc.
All rights reserved. WARING: This is biased!
5.
5© Cloudera, Inc.
All rights reserved. Building a proof of concept! Courtesy of: http://www.nefloridadesign.com/mbimages/6.jpg
6.
6© Cloudera, Inc.
All rights reserved. “Why is my job failing?”
7.
7© Cloudera, Inc.
All rights reserved. “Why is my job slow?”
8.
8© Cloudera, Inc.
All rights reserved. Misconfiguration accounts for 20% of job failures Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
9.
9© Cloudera, Inc.
All rights reserved. Resource Declaration • Not easy knowing what you need and how to specify it • Compute: • --num-executors vs. --num-cores • Memory • --executor-memory • Includes JVM overhead • Need to do the math yourself
10.
10© Cloudera, Inc.
All rights reserved. Dynamic Allocation • Let Spark do the work for you • Available since Spark 1.2* • No need to specify compute a priori • Limitation: Still required to specify cores • In future: • Allow specification of “task size” • Dynamically allocate cores
11.
11© Cloudera, Inc.
All rights reserved. YARN Configuration mismatch • Compute: • yarn.nodemanager.resource.cpu-vcores • yarn.scheduler.maximum-allocation.vcores • Memory: • yarn.nodemanager.resource.memory-mb • yarn.scheduler.maximum-allocation-mb
12.
12© Cloudera, Inc.
All rights reserved. YARN Configuration mismatch • Common to ask for more resources than allowed • Future work: • Exposing relevant YARN configurations in Spark UI • Requires changes to YARN itself
13.
13© Cloudera, Inc.
All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Another YARN goodie…
14.
14© Cloudera, Inc.
All rights reserved. yarn.nodemanager.resource.memory-mb Executor Container spark.yarn.executor.memoryOverhead (7%) (10% in 1.4) spark.executor.memory spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6) Memory allocation
15.
15© Cloudera, Inc.
All rights reserved. YARN Overhead • Future work: • Better understanding of off heap allocations • Improve memory usage visibility
16.
16© Cloudera, Inc.
All rights reserved. Run program through all our data Courtesy of:https://conniehallscott.files.wordpress.com/2013/01/411748_538971446114753_1125606225_o.jpg
17.
17© Cloudera, Inc.
All rights reserved. Data dependent tuning • As data rates change, re-tuning Spark is usually necessary • Spark is sensitive to shuffle spills • The most common knob we modify is…
18.
18© Cloudera, Inc.
All rights reserved. Partitions, Partitions, Partitions!
19.
19© Cloudera, Inc.
All rights reserved. GC Stalls
20.
20© Cloudera, Inc.
All rights reserved. Partitions • Smaller is often better • Parameterized partition size • reduceByKey(…, nPartitions) • Parameterize application • Future work: • Dynamically determine # of partitions (SPARK-4630)
21.
21© Cloudera, Inc.
All rights reserved. But for now? • Easy answer: • Keep multiplying by 1.5 and see what works • Harder answer:
22.
22© Cloudera, Inc.
All rights reserved. Shuffle less!
23.
23© Cloudera, Inc.
All rights reserved. Shuffles Wide DependencyNarrow Dependencies
24.
24© Cloudera, Inc.
All rights reserved. ReduceByKey when Possible •ReduceByKey allows a map-side-combine parsed .map{line =>(line.level, 1)} .reduceByKey{(a, b) => a + b} .collect() •GroupByKey transfers all the data parsed .map{line =>(line.level, 1)} .groupByKey.map{case(word,counts) => (word,counts.sum)} .collect()
25.
25© Cloudera, Inc.
All rights reserved. ReduceByKey when Possible •ReduceByKey •GroupByKey
26.
26© Cloudera, Inc.
All rights reserved. Security, now it’s getting serious. Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
27.
27© Cloudera, Inc.
All rights reserved. Authentication • Kerberos – the necessary evil • Ubiquitous amongst other services • YARN, HDFS, Hive, HBase, etc. • Spark utilizes delegation tokens
28.
28© Cloudera, Inc.
All rights reserved. Encryption • Control plane • File distribution • Block Manager • User UI / REST API • Data-at-rest (shuffle files) SPARK-6028 (Replace with netty) Replace with netty Spark 1.4 SPARK-2750 (SSL) SPARK-5682
29.
29© Cloudera, Inc.
All rights reserved. Authorization • Enterprises have sensitive data • Beyond HDFS file permissions • Partial access to data • Column level granularity • Apache Sentry • HDFS-Sentry synchronization plugin
30.
30© Cloudera, Inc.
All rights reserved. Customers often have shared infrastructure Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
31.
31© Cloudera, Inc.
All rights reserved. Multi-tenancy • Cluster utilization is top metric • Target: 70-80% utilization • Mixed workloads from mixed customers • We recommend YARN • Built in resource manager
32.
32© Cloudera, Inc.
All rights reserved. Underutilized Clusters Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
33.
33© Cloudera, Inc.
All rights reserved. Dynamic Allocation • Allows jobs to scale to size according to load • Knobs to control min, max and initial size • Future Work: • Target: Dynamic allocation enabled by default • Data locality & Caching • Open question with Streaming
34.
34© Cloudera, Inc.
All rights reserved. Thank you We’re Hiring!
Hinweis der Redaktion
Lets talk about what we have seen as issues from our customers as issues as they try to get Spark into production.
In scope - Focus on operational issues - Not on building the code itself Experience from our customer support tickets
In scope - Focus on operational issues - Not on building the code itself Experience from our customer support tickets
Spark makes building a proof of concept with a subset of data relatively easy. But then things go wrong Plug for my talk at Hadoop Summit
num-executors vs. num-cores? 10 executors with 1 core, or 5 executors with 2 cores? Memory: - this is the aggregate across all cores.
This shows up in the YARN NodeManager logs
Spark makes building a proof of concept with a subset of data relatively easy.
Max partition size is 2GB Small partitions help deal w/ stragglers Small partitions avoid overhead
Fastest way to shuffle a lot of data: Don’t shuffle Second fastest way to shuffle a lot of data: Shuffle a small amount of data
Data is merged together before its serialized & sent over network Vs. Higher serialization and network transfer costs
Data is merged together before its serialized & sent over network Vs. Higher serialization and network transfer costs
Data is merged togethe before its serialized & sent over network Vs. Higher serialization and network transfer costs
Spark makes building a proof of concept with a subset of data relatively easy.
Control plane File distribution Block Manager User UI / REST API Data-at-rest (shuffle files)
Spark makes building a proof of concept with a subset of data relatively easy.
Dynamic allocation: - streaming - locality (worked on) - making it even better.
Jetzt herunterladen