SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
DEBUGGING HIVE WITH 
HADOOP IN THE CLOUD 
Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer 
Altiscale, Inc. 
#OCBigData @ 20140917T1845-0700
WHO ARE WE? 
• Altiscale: Infrastructure Nerds! 
• Hadoop As A Service 
• Rack and build our own Hadoop clusters 
• Provide a suite of Hadoop tools 
o Hive, Pig, Oozie 
o Others as needed: R, Python, Spark, Mahout, Impala, etc. 
• Monthly billing plan: compute (YARN), storage (HDFS) 
• https://www.altiscale.com 
• @Altiscale #HadoopSherpa
TALK ROADMAP 
• Our Platform and Perspective 
• Hadoop 2 Primer 
• Hadoop Debugging Tools 
• Accessing Logs in Hadoop 2 
• Hive + Hadoop Architecture 
• Hive Logs 
• Hive Issues + Case Studies 
o Hive + Interactive (DRAM Centric) Processing Engines 
• Conclusion: Making Hive Easier to Use
OUR DYNAMIC PLATFORM 
• Hadoop 2.0.5 => Hadoop 2.2.0 => Hadoop 2.4.1 
• Hive 0.10 => Hive 0.12 => Stinger (Hive 0.13 + Tez) 
• Hive, Pig and Oozie most commonly used tools 
• Working with customers on: 
Spark, Impala, 0xdata, Flume, Camus/Kafka, …
ALTISCALE PERSPECTIVE 
• What we do as a service provider… 
o Performance + Reliability: Jobs finish faster, fewer failures 
o Instant Access: Always-on access to HDFS and YARN 
o Hadoop Helpdesk: Tools + experts ensure customer success 
o Secure: Networking, SOC 2 Audit, Kerberos 
o Results: Faster Time-to-Value (TTV), Lower TCO 
• Operational approach in this presentation… 
o How to use Hadoop 2 cluster tools and logs 
to debug and to tune Hive 
o This talk will not focus on query optimization
QUICK PRIMER – HADOOP 2 
!!!Hadoop!2!Cluster! 
Name!Node! 
! 
Node!Managers! 
+!! 
Data!Nodes! 
Hadoop!Slave! 
Hadoop!Slave! 
Hadoop!Slave! 
Resource!Manager! 
! 
Secondary!NameNode! 
! 
Hadoop!Slave!
QUICK PRIMER – HADOOP 2 YARN 
• Resource Manager (per cluster) 
o Manages job scheduling and execution 
o Global resource allocation 
• Application Master (per job) 
o Manages task scheduling and execution 
o Local resource allocation 
• Node Manager (per-machine agent) 
o Manages the lifecycle of task containers 
o Reports to RM on health and resource usage
HADOOP 1 VS HADOOP 2 
• No more JobTrackers, TaskTrackers 
• YARN ~ Operating System for Clusters 
o MapReduce is implemented as a YARN application 
o Bring on the applications! (Spark is just the start…) 
• Should be Transparent to Hive users
HADOOP 2 DEBUGGING TOOLS 
• Monitoring 
o System state of cluster: 
! CPU, Memory, Network, Disk 
! Nagios, Ganglia, Sensu! 
! Collectd, statd, Graphite 
o Hadoop level 
! HDFS usage 
! Resource usage: 
• Container memory allocated vs used 
• # of jobs running at the same time 
• Long running tasks
HADOOP 2 DEBUGGING TOOLS 
• Hadoop logs 
o Daemon logs: Resource Manager, NameNode, DataNode 
o Application logs: Application Master, MapReduce tasks 
o Job history file: resources allocated during job lifetime 
o Application configuration files: store all Hadoop application 
parameters 
• Source code instrumentation
ACCESSING LOGS IN HADOOP 2 
• To view the logs for a job, click on the link under the ID 
column in Resource Manager UI.
ACCESSING LOGS IN HADOOP 2 
• To view application top level logs, click on logs. 
• To view individual logs for the mappers and reducers, 
click on History.
ACCESSING LOGS IN HADOOP 2 
• Log output for the entire application.
ACCESSING LOGS IN HADOOP 2 
• Click on the Map link for mapper logs and the Reduce 
link for reducer logs.
ACCESSING LOGS IN HADOOP 2 
• Clicking on a single link under Name provides an 
overview for that particular map job.
ACCESSING LOGS IN HADOOP 2 
• Finally, clicking on the logs link will take you to the log 
output for that map job.
ACCESSING LOGS IN HADOOP 2 
• Fun, fun, donuts, and more fun…
HIVE + HADOOP 2 ARCHITECTURE 
• Hive 0.10+ 
!!!Hadoop!2!Cluster! 
Hive!CLI! Hive! 
Metastore! 
JDBC/ODBC! Hiveserver! 
AlaCon,! 
KeFle,!…!
HIVE LOGS 
• Query Log location 
• From /etc/hive/hive-site.xml: 
<property>" 
<name>hive.querylog.location</name>" 
<value>/home/hive/log/${user.name}</value>" 
</property>" 
" 
SessionStart SESSION_ID="soam_201402032341" 
TIME="1391470900594"" 
"
HIVE CLIENT LOGS 
• /etc/hive/hive-log4j.properties: 
o hive.log.dir=/var/log/hive/${user.name} 
2014-05-29 19:51:09,830 INFO parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing 
command: select count(*) from dogfood_job_data" 
2014-05-29 19:51:09,852 INFO parse.ParseDriver (ParseDriver.java:parse(197)) - Parse 
Completed" 
2014-05-29 19:51:09,852 INFO ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG 
method=parse start=1401393069830 end=1401393069852 duration=22>" 
2014-05-29 19:51:09,853 INFO ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG 
method=semanticAnalyze>" 
2014-05-29 19:51:09,890 INFO parse.SemanticAnalyzer 
(SemanticAnalyzer.java:analyzeInternal(8305)) - Starting Semantic Analysis" 
2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer 
(SemanticAnalyzer.java:analyzeInternal(8340)) - Completed phase 1 of Semantic Analysis" 
2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer 
(SemanticAnalyzer.java:getMetaData(1060)) - Get metadata for source tables" 
2014-05-29 19:51:09,906 INFO parse.SemanticAnalyzer 
(SemanticAnalyzer.java:getMetaData(1167)) - Get metadata for subqueries" 
2014-05-29 19:51:09,909 INFO parse.SemanticAnalyzer 
(SemanticAnalyzer.java:getMetaData(1187)) - Get metadata for destination tables" 
"
HIVE METASTORE LOGS 
• /etc/hive-metastore/hive-log4j.properties: 
o hive.log.dir=/service/log/hive-metastore/${user.name} 
2014-05-29 19:50:50,179 INFO metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 
get_table : db=default tbl=dogfood_job_data" 
2014-05-29 19:50:50,180 INFO HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 
cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data " 
2014-05-29 19:50:50,236 INFO metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 
get_table : db=default tbl=dogfood_job_data" 
2014-05-29 19:50:50,236 INFO HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 
cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data " 
2014-05-29 19:50:50,261 INFO metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 
get_table : db=default tbl=dogfood_job_data"
HIVE ISSUES + CASE STUDIES 
• Hive Issues 
o Hive client out of memory 
o Hive map/reduce task out of memory 
o Hive metastore out of memory 
o Hive launches too many tasks 
• Case Studies: 
o Hive “stuck” job 
o Hive “missing directories” 
o Analyze Hive Query Execution 
o Hive + Interactive (DRAM Centric) Processing Engines
HIVE CLIENT OUT OF MEMORY 
• Memory intensive client side hive query (map-side join) 
Number of reduce tasks not specified. Estimated from input data size: 999" 
In order to change the average load for a reducer (in bytes):" 
set hive.exec.reducers.bytes.per.reducer=<number>" 
In order to limit the maximum number of reducers:" 
set hive.exec.reducers.max=<number>" 
In order to set a constant number of reducers:" 
set mapred.reduce.tasks=<number>" 
java.lang.OutOfMemoryError: Java heap space! 
at java.nio.CharBuffer.wrap(CharBuffer.java:350)" 
at java.nio.CharBuffer.wrap(CharBuffer.java:373)" 
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java: 
138)"
HIVE CLIENT OUT OF MEMORY 
• Use HADOOP_HEAPSIZE prior to launching Hive client 
• HADOOP_HEAPSIZE=<new heapsize> hive <fileName>" 
• Watch out for HADOOP_CLIENT_OPTS issue in hive-env.sh! 
• Important to know the amount of memory available on 
machine running client… Do not exceed or use 
disproportionate amount. 
$ free -m" 
total used free shared buffers cached" 
Mem: 1695 1388 306 0 60 424" 
-/+ buffers/cache: 903 791" 
Swap: 895 101 794" 
!!
HIVE TASK OUT OF MEMORY 
• Query spawns MapReduce jobs that run out of memory 
• How to find this issue? 
o Hive diagnostic message 
o Hadoop MapReduce logs
HIVE TASK OUT OF MEMORY 
• Fix is to increase task RAM allocation… 
set mapreduce.map.memory.mb=<new RAM allocation>; " 
set mapreduce.reduce.memory.mb=<new RAM allocation>;" 
• Also watch out for… 
set mapreduce.map.java.opts=-Xmx<heap size>m; " 
set mapreduce.reduce.java.opts=-Xmx<heap size>m; " 
• Not a magic bullet – requires manual tuning 
• Increase in individual container memory size: 
o Decrease in overall containers that can be run 
o Decrease in overall parallelism
HIVE METASTORE OUT OF MEMORY 
• Out of memory issues not necessarily dumped to logs 
• Metastore can become unresponsive 
• Can’t submit queries 
• Restart with a higher heap size: 
export HADOOP_HEAPSIZE in hcat_server.sh 
• After notifying hive users about downtime: 
service hcat restart"
HIVE LAUNCHES TOO MANY TASKS 
• Typically a function of the input data set 
• Lots of little files
HIVE LAUNCHES TOO MANY TASKS 
• Set mapred.max.split.size to appropriate fraction of data size 
• Also verify that 
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat"
CASE STUDY: HIVE STUCK JOB 
From an Altiscale customer: 
“This job [jobid] has been running now for 
41 hours. Is it still progressing or has 
something hung up the map/reduce so it’s 
just spinning? Do you have any insight?”
HIVE STUCK JOB 
1. Received jobId, 
application_1382973574141_4536, from client 
2. Logged into client cluster. 
3. Pulled up Resource Manager 
4. Entered part of jobId (4536) in the search box. 
5. Clicked on the link that says: 
application_1382973574141_4536" 
6. On resulting Application Overview page, clicked on link 
next to “Tracking URL” that said Application Master
HIVE STUCK JOB 
7. On resulting MapReduce Application page, we clicked on the 
Job Id (job_1382973574141_4536). 
8. The resulting MapReduce Job page displayed detailed status 
of the mappers, including 4 failed mappers 
9. We then clicked on the 4 link on the Maps row in the Failed 
column. 
10. Title of the next page was “FAILED Map attempts in 
job_1382973574141_4536.” 
11. Each failed mapper generated an error message. 
12. Buried in the 16th line: 
Caused by: java.io.FileNotFoundException: File 
does not exist: hdfs://opaque_hostname:8020/ 
HiveTableDir/FileName.log.date.seq !
HIVE STUCK JOB 
• Job was stuck for a day or so, retrying a mapper that 
would never finish successfully. 
• During the job, our customers’ colleague realized input 
file was corrupted and deleted it. 
• Colleague did not anticipate the affect of removing 
corrupted data on a running job 
• Hadoop didn’t make it easy to find out: 
o RM => search => application link => AM overview page => MR 
Application Page => MR Job Page => Failed jobs page => 
parse long logs 
o Task retry without hope of success
HIVE “MISSING DIRECTORIES” 
From an Altiscale customer: 
“One problem we are seeing after the 
[Hive Metastore] restart is that we lost 
quite a few directories in [HDFS]. Is there 
a way to recover these?”
HIVE “MISSING DIRECTORIES” 
• Obtained list of “missing” directories from customer: 
o /hive/biz/prod/* 
• Confirmed they were missing from HDFS 
• Searched through NameNode audit log to get block IDs that 
belonged to missing directories. 
13/07/24 21:10:08 INFO hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /hive/biz/prod/ 
incremental/carryoverstore/postdepuis/ 
lmt_unmapped_pggroup_schema._COPYING_. 
BP-798113632-10.251.255.251-1370812162472 
blk_3560522076897293424_2448396{blockUCState=UNDER_C 
ONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.251.255.177:50 
010|RBW], 
ReplicaUnderConstruction[10.251.255.174:50010|RBW], 
ReplicaUnderConstruction[10.251.255.169:50010|RBW]]}"
HIVE “MISSING DIRECTORIES” 
• Used blockID to locate exact time of file deletion from 
Namenode logs: 
13/07/31 08:10:33 INFO hdfs.StateChange: 
BLOCK* addToInvalidates: 
blk_3560522076897293424_2448396 to 
10.251.255.177:50010 10.251.255.169:50010 
10.251.255.174:50010 " 
• Used time of deletion to inspect hive logs
HIVE “MISSING DIRECTORIES” 
QueryStart QUERY_STRING="create database biz_weekly location '/hive/biz/ 
prod'" QUERY_ID=“usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d- 
3bd8a2698f4b" TIME="1375245164667" 
: 
QueryEnd QUERY_STRING="create database biz_weekly location '/hive/biz/ 
prod'" QUERY_ID=”usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d- 
3bd8a2698f4b" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" 
TIME="1375245166203" 
: 
QueryStart QUERY_STRING="drop database biz_weekly" 
QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" 
TIME="1375256014799" 
: 
QueryEnd QUERY_STRING="drop database biz_weekly" 
QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" 
QUERY_NUM_TASKS="0" TIME="1375256014838"
HIVE “MISSING DIRECTORIES” 
• In effect, user “usrprod” issued: 
At 2013-07-31 04:32:44: create database biz_weekly 
location '/hive/biz/prod' 
At 2013-07-31 07:33:24: drop database biz_weekly 
• This is functionally equivalent to: 
hdfs dfs -rm -r /hive/biz/prod"
HIVE “MISSING DIRECTORIES” 
• Customer manually placed their own data in /hive – 
the warehouse directory managed and controlled by hive 
• Customer used CREATE and DROP db commands in 
their code 
o Hive deletes database and table locations in /hive with 
impunity 
• Why didn’t deleted data end up in .Trash? 
o Trash collection not turned on in configuration settings 
o It is now, but need a –skipTrash option (HIVE-6469)
HIVE “MISSING DIRECTORIES” 
• Hadoop forensics: piece together disparate sources… 
o Hadoop daemon logs (NameNode) 
o Hive query and metastore logs 
o Hadoop config files 
• Need better tools to correlate the different layers of the 
system: hive client, hive metastore, MapReduce job, 
YARN, HDFS, operating sytem metrics, … 
By the way… Operating any distributed system would be 
totally insane without NTP and a standard time zone (UTC).
CASE STUDY – ANALYZE QUERY 
• Customer provided Hive query + data sets 
(100GBs to ~5 TBs) 
• Needed help optimizing the query 
• Didn’t rewrite query immediately 
• Wanted to characterize query performance and isolate 
bottlenecks first
ANALYZE AND TUNE EXECUTION 
• Ran original query on the datasets in our environment: 
o Two M/R Stages: Stage-1, Stage-2 
• Long running reducers run out of memory 
o set mapreduce.reduce.memory.mb=5120" 
o Reduces slots and extends reduce time 
• Query fails to launch Stage-2 with out of memory 
o set HADOOP_HEAPSIZE=1024 on client machine 
• Query has 250,000 Mappers in Stage-2 which causes 
failure 
o set mapred.max.split.size=5368709120 
to reduce Mappers
ANALYSIS: HOW TO VISUALIZE? 
• Next challenge - how to visualize job execution? 
• Existing hadoop/hive logs not sufficient for this task 
• Wrote internal tools 
o parse job history files 
o plot mapper and reducer execution
ANALYSIS: MAP STAGE-1
ANALYSIS: REDUCE STAGE-1 
Single!reduce!task!
ANALYSIS: MAP STAGE-2
ANALYSIS: REDUCE STAGE-2
ANALYZE EXECUTION: FINDINGS 
• Lone, long running reducer in first stage of query 
• Analyzed input data: 
o Query split input data by userId 
o Bucketizing input data by userId 
o One very large bucket: “invalid” userId 
o Discussed “invalid” userid with customer 
• An error value is a common pattern! 
o Need to differentiate between “Don’t know and don’t care” 
or “don’t know and do care.”
INTERACTIVE (DRAM CENTRIC) 
PROCESSING SYSTEMS 
• Loading data into DRAM makes processing fast! 
• Examples: Spark, Impala, 0xdata, …, [SAP HANA], … 
• Streaming systems (Storm, DataTorrent) may be similar 
• Need to increase YARN container memory size
HIVE + INTERACTIVE: 
WATCH OUT FOR CONTAINER SIZE 
• Caution: larger YARN container settings for interactive 
jobs may not be right for batch systems like Hive 
• Container size: needs to combine vcores and memory: 
yarn.scheduler.maximum-allocation-vcores 
yarn.nodemanager.resource.cpu-vcores ..."
HIVE + INTERACTIVE: 
WATCH OUT FOR FRAGMENTATION 
• Attempting to schedule interactive systems and batch 
systems like Hive may result in fragmentation 
• Interactive systems may require all-or-nothing scheduling 
• Batch jobs with little tasks may starve interactive jobs
HIVE + INTERACTIVE: 
WATCH OUT FOR FRAGMENTATION 
Solutions for fragmentation… 
• Reserve interactive nodes before starting batch jobs 
• Reduce interactive container size (if the algorithm permits) 
• Node labels (YARN-726) and gang scheduling (YARN-624)
CONCLUSIONS 
• Hive + Hadoop debugging can get very complex 
o Sifting through many logs and screens 
o Automatic transmission versus manual transmission 
• Static partitioning induced by Java Virtual Machine has 
benefits but also induces challenges. 
• Where there are difficulties, there’s opportunity: 
o Better tooling, instrumentation, integration of logs/metrics 
• YARN still evolving into an operating system 
• Hadoop as a Service: aggregate and share expertise 
• Need to learn from the traditional database community!
QUESTIONS? COMMENTS?

Weitere ähnliche Inhalte

Was ist angesagt?

Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Yahoo Developer Network
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
DataWorks Summit
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 

Was ist angesagt? (20)

Rapid Development of Big Data applications using Spring for Apache Hadoop
Rapid Development of Big Data applications using Spring for Apache HadoopRapid Development of Big Data applications using Spring for Apache Hadoop
Rapid Development of Big Data applications using Spring for Apache Hadoop
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Chef & OpenStack: OSCON 2014
Chef & OpenStack: OSCON 2014Chef & OpenStack: OSCON 2014
Chef & OpenStack: OSCON 2014
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 

Andere mochten auch (8)

www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Intro to Accounts
Intro to Accounts Intro to Accounts
Intro to Accounts
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Rock Kickoff 2013
Rock Kickoff 2013 Rock Kickoff 2013
Rock Kickoff 2013
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Documento guia de_wifi eset
Documento guia de_wifi esetDocumento guia de_wifi eset
Documento guia de_wifi eset
 

Ähnlich wie OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Tomas Cervenka
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
fann wu
 

Ähnlich wie OC Big Data Monthly Meetup #5 - Session 1 - Altiscale (20)

Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
ha_module5
ha_module5ha_module5
ha_module5
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
BIG DATA ANALYSIS
BIG DATA ANALYSISBIG DATA ANALYSIS
BIG DATA ANALYSIS
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 

Mehr von Big Data Joe™ Rossi

Mehr von Big Data Joe™ Rossi (11)

Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
 
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

  • 1. DEBUGGING HIVE WITH HADOOP IN THE CLOUD Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer Altiscale, Inc. #OCBigData @ 20140917T1845-0700
  • 2. WHO ARE WE? • Altiscale: Infrastructure Nerds! • Hadoop As A Service • Rack and build our own Hadoop clusters • Provide a suite of Hadoop tools o Hive, Pig, Oozie o Others as needed: R, Python, Spark, Mahout, Impala, etc. • Monthly billing plan: compute (YARN), storage (HDFS) • https://www.altiscale.com • @Altiscale #HadoopSherpa
  • 3. TALK ROADMAP • Our Platform and Perspective • Hadoop 2 Primer • Hadoop Debugging Tools • Accessing Logs in Hadoop 2 • Hive + Hadoop Architecture • Hive Logs • Hive Issues + Case Studies o Hive + Interactive (DRAM Centric) Processing Engines • Conclusion: Making Hive Easier to Use
  • 4. OUR DYNAMIC PLATFORM • Hadoop 2.0.5 => Hadoop 2.2.0 => Hadoop 2.4.1 • Hive 0.10 => Hive 0.12 => Stinger (Hive 0.13 + Tez) • Hive, Pig and Oozie most commonly used tools • Working with customers on: Spark, Impala, 0xdata, Flume, Camus/Kafka, …
  • 5. ALTISCALE PERSPECTIVE • What we do as a service provider… o Performance + Reliability: Jobs finish faster, fewer failures o Instant Access: Always-on access to HDFS and YARN o Hadoop Helpdesk: Tools + experts ensure customer success o Secure: Networking, SOC 2 Audit, Kerberos o Results: Faster Time-to-Value (TTV), Lower TCO • Operational approach in this presentation… o How to use Hadoop 2 cluster tools and logs to debug and to tune Hive o This talk will not focus on query optimization
  • 6. QUICK PRIMER – HADOOP 2 !!!Hadoop!2!Cluster! Name!Node! ! Node!Managers! +!! Data!Nodes! Hadoop!Slave! Hadoop!Slave! Hadoop!Slave! Resource!Manager! ! Secondary!NameNode! ! Hadoop!Slave!
  • 7. QUICK PRIMER – HADOOP 2 YARN • Resource Manager (per cluster) o Manages job scheduling and execution o Global resource allocation • Application Master (per job) o Manages task scheduling and execution o Local resource allocation • Node Manager (per-machine agent) o Manages the lifecycle of task containers o Reports to RM on health and resource usage
  • 8. HADOOP 1 VS HADOOP 2 • No more JobTrackers, TaskTrackers • YARN ~ Operating System for Clusters o MapReduce is implemented as a YARN application o Bring on the applications! (Spark is just the start…) • Should be Transparent to Hive users
  • 9. HADOOP 2 DEBUGGING TOOLS • Monitoring o System state of cluster: ! CPU, Memory, Network, Disk ! Nagios, Ganglia, Sensu! ! Collectd, statd, Graphite o Hadoop level ! HDFS usage ! Resource usage: • Container memory allocated vs used • # of jobs running at the same time • Long running tasks
  • 10. HADOOP 2 DEBUGGING TOOLS • Hadoop logs o Daemon logs: Resource Manager, NameNode, DataNode o Application logs: Application Master, MapReduce tasks o Job history file: resources allocated during job lifetime o Application configuration files: store all Hadoop application parameters • Source code instrumentation
  • 11.
  • 12. ACCESSING LOGS IN HADOOP 2 • To view the logs for a job, click on the link under the ID column in Resource Manager UI.
  • 13. ACCESSING LOGS IN HADOOP 2 • To view application top level logs, click on logs. • To view individual logs for the mappers and reducers, click on History.
  • 14. ACCESSING LOGS IN HADOOP 2 • Log output for the entire application.
  • 15. ACCESSING LOGS IN HADOOP 2 • Click on the Map link for mapper logs and the Reduce link for reducer logs.
  • 16. ACCESSING LOGS IN HADOOP 2 • Clicking on a single link under Name provides an overview for that particular map job.
  • 17. ACCESSING LOGS IN HADOOP 2 • Finally, clicking on the logs link will take you to the log output for that map job.
  • 18. ACCESSING LOGS IN HADOOP 2 • Fun, fun, donuts, and more fun…
  • 19. HIVE + HADOOP 2 ARCHITECTURE • Hive 0.10+ !!!Hadoop!2!Cluster! Hive!CLI! Hive! Metastore! JDBC/ODBC! Hiveserver! AlaCon,! KeFle,!…!
  • 20. HIVE LOGS • Query Log location • From /etc/hive/hive-site.xml: <property>" <name>hive.querylog.location</name>" <value>/home/hive/log/${user.name}</value>" </property>" " SessionStart SESSION_ID="soam_201402032341" TIME="1391470900594"" "
  • 21. HIVE CLIENT LOGS • /etc/hive/hive-log4j.properties: o hive.log.dir=/var/log/hive/${user.name} 2014-05-29 19:51:09,830 INFO parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select count(*) from dogfood_job_data" 2014-05-29 19:51:09,852 INFO parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed" 2014-05-29 19:51:09,852 INFO ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=parse start=1401393069830 end=1401393069852 duration=22>" 2014-05-29 19:51:09,853 INFO ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=semanticAnalyze>" 2014-05-29 19:51:09,890 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8305)) - Starting Semantic Analysis" 2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8340)) - Completed phase 1 of Semantic Analysis" 2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1060)) - Get metadata for source tables" 2014-05-29 19:51:09,906 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1167)) - Get metadata for subqueries" 2014-05-29 19:51:09,909 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1187)) - Get metadata for destination tables" "
  • 22. HIVE METASTORE LOGS • /etc/hive-metastore/hive-log4j.properties: o hive.log.dir=/service/log/hive-metastore/${user.name} 2014-05-29 19:50:50,179 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data" 2014-05-29 19:50:50,180 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data " 2014-05-29 19:50:50,236 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data" 2014-05-29 19:50:50,236 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data " 2014-05-29 19:50:50,261 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"
  • 23. HIVE ISSUES + CASE STUDIES • Hive Issues o Hive client out of memory o Hive map/reduce task out of memory o Hive metastore out of memory o Hive launches too many tasks • Case Studies: o Hive “stuck” job o Hive “missing directories” o Analyze Hive Query Execution o Hive + Interactive (DRAM Centric) Processing Engines
  • 24. HIVE CLIENT OUT OF MEMORY • Memory intensive client side hive query (map-side join) Number of reduce tasks not specified. Estimated from input data size: 999" In order to change the average load for a reducer (in bytes):" set hive.exec.reducers.bytes.per.reducer=<number>" In order to limit the maximum number of reducers:" set hive.exec.reducers.max=<number>" In order to set a constant number of reducers:" set mapred.reduce.tasks=<number>" java.lang.OutOfMemoryError: Java heap space! at java.nio.CharBuffer.wrap(CharBuffer.java:350)" at java.nio.CharBuffer.wrap(CharBuffer.java:373)" at java.lang.StringCoding$StringDecoder.decode(StringCoding.java: 138)"
  • 25. HIVE CLIENT OUT OF MEMORY • Use HADOOP_HEAPSIZE prior to launching Hive client • HADOOP_HEAPSIZE=<new heapsize> hive <fileName>" • Watch out for HADOOP_CLIENT_OPTS issue in hive-env.sh! • Important to know the amount of memory available on machine running client… Do not exceed or use disproportionate amount. $ free -m" total used free shared buffers cached" Mem: 1695 1388 306 0 60 424" -/+ buffers/cache: 903 791" Swap: 895 101 794" !!
  • 26. HIVE TASK OUT OF MEMORY • Query spawns MapReduce jobs that run out of memory • How to find this issue? o Hive diagnostic message o Hadoop MapReduce logs
  • 27. HIVE TASK OUT OF MEMORY • Fix is to increase task RAM allocation… set mapreduce.map.memory.mb=<new RAM allocation>; " set mapreduce.reduce.memory.mb=<new RAM allocation>;" • Also watch out for… set mapreduce.map.java.opts=-Xmx<heap size>m; " set mapreduce.reduce.java.opts=-Xmx<heap size>m; " • Not a magic bullet – requires manual tuning • Increase in individual container memory size: o Decrease in overall containers that can be run o Decrease in overall parallelism
  • 28. HIVE METASTORE OUT OF MEMORY • Out of memory issues not necessarily dumped to logs • Metastore can become unresponsive • Can’t submit queries • Restart with a higher heap size: export HADOOP_HEAPSIZE in hcat_server.sh • After notifying hive users about downtime: service hcat restart"
  • 29. HIVE LAUNCHES TOO MANY TASKS • Typically a function of the input data set • Lots of little files
  • 30. HIVE LAUNCHES TOO MANY TASKS • Set mapred.max.split.size to appropriate fraction of data size • Also verify that hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat"
  • 31. CASE STUDY: HIVE STUCK JOB From an Altiscale customer: “This job [jobid] has been running now for 41 hours. Is it still progressing or has something hung up the map/reduce so it’s just spinning? Do you have any insight?”
  • 32. HIVE STUCK JOB 1. Received jobId, application_1382973574141_4536, from client 2. Logged into client cluster. 3. Pulled up Resource Manager 4. Entered part of jobId (4536) in the search box. 5. Clicked on the link that says: application_1382973574141_4536" 6. On resulting Application Overview page, clicked on link next to “Tracking URL” that said Application Master
  • 33. HIVE STUCK JOB 7. On resulting MapReduce Application page, we clicked on the Job Id (job_1382973574141_4536). 8. The resulting MapReduce Job page displayed detailed status of the mappers, including 4 failed mappers 9. We then clicked on the 4 link on the Maps row in the Failed column. 10. Title of the next page was “FAILED Map attempts in job_1382973574141_4536.” 11. Each failed mapper generated an error message. 12. Buried in the 16th line: Caused by: java.io.FileNotFoundException: File does not exist: hdfs://opaque_hostname:8020/ HiveTableDir/FileName.log.date.seq !
  • 34. HIVE STUCK JOB • Job was stuck for a day or so, retrying a mapper that would never finish successfully. • During the job, our customers’ colleague realized input file was corrupted and deleted it. • Colleague did not anticipate the affect of removing corrupted data on a running job • Hadoop didn’t make it easy to find out: o RM => search => application link => AM overview page => MR Application Page => MR Job Page => Failed jobs page => parse long logs o Task retry without hope of success
  • 35. HIVE “MISSING DIRECTORIES” From an Altiscale customer: “One problem we are seeing after the [Hive Metastore] restart is that we lost quite a few directories in [HDFS]. Is there a way to recover these?”
  • 36. HIVE “MISSING DIRECTORIES” • Obtained list of “missing” directories from customer: o /hive/biz/prod/* • Confirmed they were missing from HDFS • Searched through NameNode audit log to get block IDs that belonged to missing directories. 13/07/24 21:10:08 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hive/biz/prod/ incremental/carryoverstore/postdepuis/ lmt_unmapped_pggroup_schema._COPYING_. BP-798113632-10.251.255.251-1370812162472 blk_3560522076897293424_2448396{blockUCState=UNDER_C ONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.251.255.177:50 010|RBW], ReplicaUnderConstruction[10.251.255.174:50010|RBW], ReplicaUnderConstruction[10.251.255.169:50010|RBW]]}"
  • 37. HIVE “MISSING DIRECTORIES” • Used blockID to locate exact time of file deletion from Namenode logs: 13/07/31 08:10:33 INFO hdfs.StateChange: BLOCK* addToInvalidates: blk_3560522076897293424_2448396 to 10.251.255.177:50010 10.251.255.169:50010 10.251.255.174:50010 " • Used time of deletion to inspect hive logs
  • 38. HIVE “MISSING DIRECTORIES” QueryStart QUERY_STRING="create database biz_weekly location '/hive/biz/ prod'" QUERY_ID=“usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d- 3bd8a2698f4b" TIME="1375245164667" : QueryEnd QUERY_STRING="create database biz_weekly location '/hive/biz/ prod'" QUERY_ID=”usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d- 3bd8a2698f4b" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1375245166203" : QueryStart QUERY_STRING="drop database biz_weekly" QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" TIME="1375256014799" : QueryEnd QUERY_STRING="drop database biz_weekly" QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" QUERY_NUM_TASKS="0" TIME="1375256014838"
  • 39. HIVE “MISSING DIRECTORIES” • In effect, user “usrprod” issued: At 2013-07-31 04:32:44: create database biz_weekly location '/hive/biz/prod' At 2013-07-31 07:33:24: drop database biz_weekly • This is functionally equivalent to: hdfs dfs -rm -r /hive/biz/prod"
  • 40. HIVE “MISSING DIRECTORIES” • Customer manually placed their own data in /hive – the warehouse directory managed and controlled by hive • Customer used CREATE and DROP db commands in their code o Hive deletes database and table locations in /hive with impunity • Why didn’t deleted data end up in .Trash? o Trash collection not turned on in configuration settings o It is now, but need a –skipTrash option (HIVE-6469)
  • 41. HIVE “MISSING DIRECTORIES” • Hadoop forensics: piece together disparate sources… o Hadoop daemon logs (NameNode) o Hive query and metastore logs o Hadoop config files • Need better tools to correlate the different layers of the system: hive client, hive metastore, MapReduce job, YARN, HDFS, operating sytem metrics, … By the way… Operating any distributed system would be totally insane without NTP and a standard time zone (UTC).
  • 42. CASE STUDY – ANALYZE QUERY • Customer provided Hive query + data sets (100GBs to ~5 TBs) • Needed help optimizing the query • Didn’t rewrite query immediately • Wanted to characterize query performance and isolate bottlenecks first
  • 43. ANALYZE AND TUNE EXECUTION • Ran original query on the datasets in our environment: o Two M/R Stages: Stage-1, Stage-2 • Long running reducers run out of memory o set mapreduce.reduce.memory.mb=5120" o Reduces slots and extends reduce time • Query fails to launch Stage-2 with out of memory o set HADOOP_HEAPSIZE=1024 on client machine • Query has 250,000 Mappers in Stage-2 which causes failure o set mapred.max.split.size=5368709120 to reduce Mappers
  • 44. ANALYSIS: HOW TO VISUALIZE? • Next challenge - how to visualize job execution? • Existing hadoop/hive logs not sufficient for this task • Wrote internal tools o parse job history files o plot mapper and reducer execution
  • 46. ANALYSIS: REDUCE STAGE-1 Single!reduce!task!
  • 49. ANALYZE EXECUTION: FINDINGS • Lone, long running reducer in first stage of query • Analyzed input data: o Query split input data by userId o Bucketizing input data by userId o One very large bucket: “invalid” userId o Discussed “invalid” userid with customer • An error value is a common pattern! o Need to differentiate between “Don’t know and don’t care” or “don’t know and do care.”
  • 50. INTERACTIVE (DRAM CENTRIC) PROCESSING SYSTEMS • Loading data into DRAM makes processing fast! • Examples: Spark, Impala, 0xdata, …, [SAP HANA], … • Streaming systems (Storm, DataTorrent) may be similar • Need to increase YARN container memory size
  • 51. HIVE + INTERACTIVE: WATCH OUT FOR CONTAINER SIZE • Caution: larger YARN container settings for interactive jobs may not be right for batch systems like Hive • Container size: needs to combine vcores and memory: yarn.scheduler.maximum-allocation-vcores yarn.nodemanager.resource.cpu-vcores ..."
  • 52. HIVE + INTERACTIVE: WATCH OUT FOR FRAGMENTATION • Attempting to schedule interactive systems and batch systems like Hive may result in fragmentation • Interactive systems may require all-or-nothing scheduling • Batch jobs with little tasks may starve interactive jobs
  • 53. HIVE + INTERACTIVE: WATCH OUT FOR FRAGMENTATION Solutions for fragmentation… • Reserve interactive nodes before starting batch jobs • Reduce interactive container size (if the algorithm permits) • Node labels (YARN-726) and gang scheduling (YARN-624)
  • 54. CONCLUSIONS • Hive + Hadoop debugging can get very complex o Sifting through many logs and screens o Automatic transmission versus manual transmission • Static partitioning induced by Java Virtual Machine has benefits but also induces challenges. • Where there are difficulties, there’s opportunity: o Better tooling, instrumentation, integration of logs/metrics • YARN still evolving into an operating system • Hadoop as a Service: aggregate and share expertise • Need to learn from the traditional database community!