Suche senden
Hochladen
Yarnthug2014
•
1 gefällt mir
•
586 views
Joseph Niemiec
Folgen
Toronto Hadoop User Group 2014 YARN Roadmap
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 34
Empfohlen
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
Hortonworks
Yarns About Yarn
Yarns About Yarn
Cloudera, Inc.
Tez Data Processing over Yarn
Tez Data Processing over Yarn
InMobi Technology
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
VertiCloud Inc
Empfohlen
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
Hortonworks
Yarns About Yarn
Yarns About Yarn
Cloudera, Inc.
Tez Data Processing over Yarn
Tez Data Processing over Yarn
InMobi Technology
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
VertiCloud Inc
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
Philly DB MapR Overview
Philly DB MapR Overview
MapR Technologies
Yarn
Yarn
Yu Xia
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
Data Con LA
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Insight Technology, Inc.
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
Tsuyoshi OZAWA
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
POSSCON
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
Tuning up with Apache Tez
Tuning up with Apache Tez
Gal Vinograd
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit
MHUG - YARN
MHUG - YARN
Joseph Niemiec
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
Weitere ähnliche Inhalte
Was ist angesagt?
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
Philly DB MapR Overview
Philly DB MapR Overview
MapR Technologies
Yarn
Yarn
Yu Xia
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
Data Con LA
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Insight Technology, Inc.
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
Tsuyoshi OZAWA
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
POSSCON
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
Tuning up with Apache Tez
Tuning up with Apache Tez
Gal Vinograd
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit
Was ist angesagt?
(20)
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
Philly DB MapR Overview
Philly DB MapR Overview
Yarn
Yarn
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Tuning up with Apache Tez
Tuning up with Apache Tez
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
Ähnlich wie Yarnthug2014
MHUG - YARN
MHUG - YARN
Joseph Niemiec
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
Get Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
Hortonworks
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
Steve Loughran
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
DataWorks Summit
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
Tsuyoshi OZAWA
Hadoop In Action
Hadoop In Action
Bigdata Meetup Kochi
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Cloudera, Inc.
Ähnlich wie Yarnthug2014
(20)
MHUG - YARN
MHUG - YARN
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Get Started Building YARN Applications
Get Started Building YARN Applications
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
Hadoop In Action
Hadoop In Action
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Kürzlich hochgeladen
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
The Digital Insurer
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
SeasiaInfotech2
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Zilliz
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Kürzlich hochgeladen
(20)
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Yarnthug2014
1.
Page1 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Apache Hadoop YARN Yet Another Resource Negotiator
2.
Page © Hortonworks
Inc. 2014 Quick Bio • Hadoop user for ~3 years • One of the Co-Authors for Apache Hadoop YARN • Originally used Hadoop for location based services • Destination Prediction • Traffic Analysis • Effects of weather at client locations on call center call types • Pending Patent in Automotive/Telematics domain • Defensive Paper on M2M Validation • Started on analytics to be better at an MMORPG
3.
Page © Hortonworks
Inc. 2014 Agenda • Hadoop History • Hadoop 1 Recap • What is YARN • MapReduce on YARN • Multi-Workload, Multi-Tenant • Example YARN App • YARN Futures • Short Demo • Takeaways & QA
4.
Page © Hortonworks
Inc. 2014 History Lesson •Requirement #1: Scalability – •The next-generation compute platform should scale horizontally to tens of thousands of nodes and concurrent applications •Phase 0: The Era of Ad Hoc Clusters •Per User, Ingress & Egress every time •No data persisted on HDFS •Phase 1: Hadoop on Demand (HOD) •Private ‘spin-up, spin-down processing’ Clusters on Shared Commodity Hardware •Data persisted on HDFS as shared service •Phase 2: Dawn of Shared Compute Cluster •Multi-Tenant shared MapReduce & HDFS •Phase 3: Emergence of YARN •Multi-Tenant, Multi-Workload, Beyond MapReduce
5.
Page © Hortonworks
Inc. 2014 Hadoop 1 Recap
6.
Page © Hortonworks
Inc. 2014 • JobTracker • TaskTracker • Tasks Hadoop MapReduce Classic Page 6
7.
Page7 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved MapReduce Classic: Limitations • Scalability • Maximum Cluster size – 4,000 nodes • Maximum concurrent tasks – 40,000 • Availability • Failure kills all queued and running jobs • Hard partition of resources into map and reduce slots • Low resource utilization • Lacks support for alternate paradigms and services • Iterative applications implemented using MapReduce are 10x slower Page 7
8.
Page © Hortonworks
Inc. 2014 What is YARN
9.
Page © Hortonworks
Inc. 2014 What is YARN? •Cluster Operating System •Enable’s Generic Data Processing Tasks with ‘Containers’ •Big Compute (Metal Detectors) for Big Data (Hay Stack) •Resource Manager •Global resource scheduler •Node Manager •Per-machine agent •Manages the life-cycle of container & resource monitoring •Application Master •Per-application master that manages application scheduling and task execution •E.g. MapReduce Application Master •Container •Basic unit of allocation •Fine-grained resource allocation across multiple resource types •(memory, cpu, disk, network, gpu etc.)
10.
Page © Hortonworks
Inc. 2014 YARN what is it good for? •Compute for Data Processing •Compute for Embarrassingly Parallel Problems •Problems with tiny datasets and/or that don’t depend on one another •ie: Exhaustive Search, Trade Simulations, Climate Models, Genetic Algorithms •Beyond MapReduce •Enables Multi Workload Compute Applications on a Single Shared Infrastructure •Stream Processing, NoSQL, Search, InMemory, Graphs, etc •ANYTHING YOU CAN START FROM CLI! •Slider & Code Reuse •Run existing applications on YARN: HBase on YARN, Storm on YARN •Reuse existing Java code in Containers making serial applications parallel
11.
Page © Hortonworks
Inc. 2014 Multi-workload Processing HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) Others (data processing) HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, …
12.
Page12 © Hortonworks
Inc. 2011 – 2014. All Rights Reserved Beyond MapReduce Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS YARN : Data Operating System DATA MANAGEMENT SECURITYDATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Batch Map Reduce
13.
Page © Hortonworks
Inc. 2014 MapReduce on YARN
14.
Page © Hortonworks
Inc. 2014 Apache Hadoop MapReduce on YARN • Original use-case • Most complex application to build • Data-locality • Fault tolerance • ApplicationMaster recovery: Check point to HDFS • Intra-application Priorities: Maps v/s Reduces • Security • Isolation • Binary compatible with Apache Hadoop 1.x
15.
Page © Hortonworks
Inc. 2014 Efficiency Gains of MRv2 • Key Optimizations • No hard segmentation of resource into map and reduce slots • Yarn scheduler is more efficient • MRv2 framework has become more efficient than MRv1 for • instance shuffle phase in MRv2 is more performant with the usage of a different web server. • Yahoo has over 30000 nodes running YARN across over 365PB of data. • They calculate running about 400,000 jobs per day for about 10 million hours of compute time. • They also have estimated a 60% – 150% improvement on node usage per day.
16.
Page © Hortonworks
Inc. 2014 © Hortonworks Inc. 2013 An Example Calculating Node Capacity Important Parameters –mapreduce.[map|reduce].memory.mb – This is the physical ram hard-limit enforced by Hadoop on the task –mapreduce.[map|reduce].java.opts – The heapsize of the jvm –Xmx –yarn.scheduler.minimum-allocation-mb – The smallest container yarn will allow –yarn.nodemanager.resource.memory-mb – The amount of physical ram on the node
17.
Page © Hortonworks
Inc. 2014 © Hortonworks Inc. 2013 Calculating Node Capacity Continued • Lets pretend we need a 1g map and a 2g reduce • mapreduce[map|reduce].java.opts = [-Xmx 1g | -Xmx 2g] • Remember a container has more overhead then just your heap! • Add 512mb to the container limit for overhead • mapreduce.[map.reduce].memory.mb= [1536 | 2560] • We have 36g per node and minimum allocations of 512mb • yarn.nodemanager.resource.memory-mb=36864 • yarn.scheduler.minimum-allocation-mb=512 • Our 36g node can support • 24 Maps OR 14 Reducers OR any combination allowed by the resources on the node
18.
Page © Hortonworks
Inc. 2014 Multi-Workload, Multi-Tenant
19.
© Hortonworks Inc.
2012 NodeManager NodeManager NodeManager NodeManager map 1.1 vertex1.2.2 NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager map1.2 reduce1.1 Batch vertex1.1.1 vertex1.1.2 vertex1.2.1 Interactive SQL YARN as OS for Data Lake ResourceManager Scheduler Real-Time nimbus0 nimbus1 nimbus2
20.
© Hortonworks Inc.
2012 Multi-Tenant YARN ResourceManager Scheduler root Adhoc 10% DW 60% Mrkting 30% Dev 10% Reserved 20% Prod 70% Prod 80% Dev 20% P0 70% P1 30%
21.
© Hortonworks Inc.
2013 Multi-Tenancy with CapacityScheduler • Queues • Economics as queue-capacity – Heirarchical Queues • SLAs – Preemption • Resource Isolation – Linux: cgroups – MS Windows: Job Control – Roadmap: Virtualization (Xen, KVM) • Administration – Queue ACLs – Run-time re-configuration for queues – Charge-back Page 21 ResourceManager Scheduler root Adhoc 10% DW 70% Mrkting 20% Dev 10% Reserved 20% Prod 70% Prod 80% Dev 20% P0 70% P1 30% Capacity Scheduler Hierarchical Queues
22.
Capacity Scheduler Configuration Root$Queue Max$Queue$Capacity Guaranteed$Queue$ Capacity Sub$Queue ROOT yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.queues=adhoc,batch,prod ADHOC yarn.scheduler.capacity.root.adhoc.acl_submit_applications=* yarn.scheduler.capacity.root.adhoc.capacity=25 yarn.scheduler.capacity.root.adhoc.maximum-capacity=50 yarn.scheduler.capacity.root.adhoc.state=RUNNING yarn.scheduler.capacity.root.adhoc.user-limit-factor=2 PROD yarn.scheduler.capacity.root.prod.acl_administer_queue=yarn yarn.scheduler.capacity.root.prod.acl_submit_applications=yarn,mapred yarn.scheduler.capacity.root.prod.capacity=50 yarn.scheduler.capacity.root.prod.queues=reports,ops PROD
- Reports yarn.scheduler.capacity.root.prod.reports.state=RUNNING yarn.scheduler.capacity.root.prod.reports.capacity=80 yarn.scheduler.capacity.root.prod.reports.maximum-capacity=100 yarn.scheduler.capacity.root.prod.reports.user-limit-factor=3 yarn.scheduler.capacity.root.prod.reports.minimum-user-limit- percent=20 yarn.scheduler.capacity.prod.reports.maximum-applications = 1 ROOT yarn.scheduler.capacity.root.capacity = 100 ADHOC yarn.scheduler.capacity. root.adhoc.maximum- capacity = 50 yarn.scheduler.capa city.root.adhoc.capac ity = 25 BATCH yarn.scheduler.ca pacity.root.batch. maximum- capacity = 75 yarn.scheduler .capacity.root.b atch.capacity = 25 PROD yarn.scheduler.capacity.root.prod.reports.maximum -capacity = 100 yarn.scheduler.capacity.root.prod.ops.maximum- capacity = 50 yarn.scheduler.capacity.root.prod.capacity = 50 yarn.scheduler.capacity .root.prod.reports.capac ity = 80 yarn.scheduler.c apacity.root.prod .ops.capacity = 20
23.
Page © Hortonworks
Inc. 2014 An Example YARN App
24.
Page © Hortonworks
Inc. 2014 Moya – Memcached on YARN •Proof of concept project •Minimum Effort •Used Distributed Shell as skeleton •GIT Hub: •https://github.com/josephxsxn/moya •Today –Launch N-jmemcached Server Daemons –Provides Configuration Information Via Zookeeper
25.
Page © Hortonworks
Inc. 2014 NodeManager Moya Architecture NodeManager NodeManager Zookeeper Quorum NodeManager Container 1.1 ResourceManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Container 1.3 AM 1 Scheduler ZK 1 ZK3 ZK 2 Container 1.2 Program using Memcache Client AM to Container ZK Configuration Info and Heartbeat Client Memcached Request
26.
Page © Hortonworks
Inc. 2014 What’s inside the Moya AppMaster? •Negotiates for all other application containers //Request Containers Priority pri = Records.newRecord(Priority.class); pri.setPriority(requestPriority); // Set up resource type requirements Resource capability = Records.newRecord(Resource.class); capability.setMemory(containerMemory); //Memory Req, Hosts, Rack, Priority, Number of Containers ContainerRequest request = new ContainerRequest(capability, null, null, pri, numContainers); resourceManager.addContainerRequest(containerAsk); //Resource Manager calls us back with a list of allocatedContainers // Launch container by create ContainerLaunchContext for (Container allocatedContainer : allocatedContainers) { ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class); //Setup command details on next slide nmClient.startContainer(container, ctx);}
27.
Page © Hortonworks
Inc. 2014 //Setup Local Resources like our runnable jar 1.Map<String, LocalResource> localResources = new HashMap<String, LocalResource>(); 2.LocalResource libsJarRsrc = Records.newRecord(LocalResource.class); 3.libsJarRsrc.setType(LocalResourceType.FILE); 4.libsJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION); //Get the path which was provided by the Client when it placed the libs on HDFS 1. libsJarRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(libsPath))); 2. localResources.put("Runnable.jar", libsJarRsrc); 3. ctx.setLocalResources(localResources); //Setup the environment for the container. 1. Map<String, String> env = new HashMap<String, String>(); 2.StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$()) .append(File.pathSeparatorChar).append("./*"); //Initial Commands To startup the runnable jar 1. Vector<CharSequence> vargs = new Vector<CharSequence>(5); 2. vargs.add(Environment.JAVA_HOME.$() + "/bin/java -jar Runnable.jar”) //Convert vargs to string and add to Launch Context 1. ctx.setCommands(commands); Moya AppMaster Continued •Sets up the Container, Container Environment, Initial Commands, and Local Resources
28.
Page © Hortonworks
Inc. 2014 A Look at the Moya Memcached Container •Simple! •Join the Zookeeper Moya group with Hostname and Port as member name //initialize the server daemon = new MemCacheDaemon<LocalCacheElement>(); CacheStorage<Key, LocalCacheElement> storage; InetSocketAddress c = new InetSocketAddress(8555); storage = ConcurrentLinkedHashMap.create( ConcurrentLinkedHashMap.EvictionPolicy.FIFO, 15000, 67108864); daemon.setCache(new CacheImpl(storage)); daemon.setBinary(false); daemon.setAddr(c); daemon.setIdleTime(120); daemon.setVerbose(true); daemon.start(); //StartJettyTest.main(new String[] {}); // Whats this? // Add self in zookeer /moya/ group JoinGroup.main( new String[] { "172.16.165.155:2181”, "moya », InetAddress.getLocalHost().getHostName() + « : »+ c.getPort() });
29.
Page © Hortonworks
Inc. 2014 YARN Futures
30.
Page © Hortonworks
Inc. 2014 Targeted Futures*Asterisks Included Free •YARN Node Labels (YARN-796) •Needed for long lived services •Apache Slider* •A framework to support deployment and management of arbitrary applications on YARN •HBase on YARN, Storm on YARN •Ambari Deploys YARN HA* •CPU Scheduling (YARN-2)* •Helping enable Storm and HBase on YARN scheduling •CGroups Resource Isolation across RAM and CPU (YARN-3)* •Application Timeline Server (ATS) goes GA (YARN-321)* •Enable generic data collection captured in YARN apps •MRv2 Integration with ATS (MAPREDUCE-5858)* •Docker Container Executor (YARN-1964) •Work Preserving Restart (YARN-1489)
31.
Page © Hortonworks
Inc. 2014 A Short Demo
32.
Page © Hortonworks
Inc. 2014 Preemption / Ambari REST Multi Tenant Load Demo • Multiple workloads hitting queues with & without preemption • Multi Tenant Queues – • adhoc (default) : min 25% max 50% • batch: min 25% max 75% • prod: min 50% prod.reports: min 80% of Prod, max 100% of cluster prod.ops: min 20% of Prod, max 50% of cluster • Demonstrate cluster automation with Ambari REST API • Scheduler Changes & Refresh • YARN Configuration Changes • YARN Restart • Launching MR Jobs Preemption Operations Order
33.
Page © Hortonworks
Inc. 2014 Takeaways & QA
34.
Page © Hortonworks
Inc. 2014 Thank You!