Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale

DEBUGGING HIVE WITH
HADOOP IN THE CLOUD
Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer
Altiscale, Inc.
#LABDUG @ 20150115T19:30-0800

WHO ARE WE?
•  Altiscale: Infrastructure Nerds!
•  Hadoop As A Service
•  Rack and build our own Hadoop clusters
•  Provide a suite of Hadoop tools
o  Hive, Pig, Oozie
o  Others as needed: R, Python, Spark, Mahout, Impala, etc.
•  Monthly billing plan: compute (YARN), storage (HDFS)
•  https://www.altiscale.com
•  @Altiscale #HadoopSherpa

TALK ROADMAP
•  Our Platform and Perspective
•  Hadoop 2 Primer
•  Hadoop Debugging Tools
•  Accessing Logs in Hadoop 2
•  Hive + Hadoop Architecture
•  Hive Logs
•  Hive Issues + Case Studies
o  Hive + Interactive (DRAM Centric) Processing Engines
•  Conclusion: Making Hive Easier to Use

OUR DYNAMIC PLATFORM
•  Hadoop 2.0.5 => Hadoop 2.2.0 => Hadoop 2.4.1 => …
•  Hive 0.10 => Hive 0.12 => Stinger (Hive 0.13 + Tez) => …
•  Hive, Pig and Oozie most commonly used tools
•  Working with customers on:
Spark, H2O, Trifacta, Impala, Flume, Camus/Kafka, …

ALTISCALE PERSPECTIVE
•  What we do as a service provider…
o  Performance + Reliability: Jobs finish faster, fewer failures
o  Instant Access: Always-on access to HDFS and YARN
o  Hadoop Helpdesk: Tools + experts ensure customer success
o  Secure: Networking, SOC 2 Audit, Kerberos
o  Results: Faster Time-to-Value (TTV), Lower TCO
•  Operational approach in this presentation…
o  How to use Hadoop 2 cluster tools and logs
to debug and to tune Hive
o  This talk will not focus on query optimization

Hadoop
2
Cluster

Name
Node

Hadoop
Slave

Hadoop
Slave

Hadoop
Slave

Resource
Manager

Secondary
NameNode

Hadoop
Slave

Node
Managers

+

Data
Nodes

QUICK PRIMER – HADOOP 2

QUICK PRIMER – HADOOP 2 YARN
•  Resource Manager (per cluster)
o  Manages job scheduling and execution
o  Global resource allocation
•  Application Master (per job)
o  Manages task scheduling and execution
o  Local resource allocation
•  Node Manager (per-machine agent)
o  Manages the lifecycle of task containers
o  Reports to RM on health and resource usage

HADOOP 1 VS HADOOP 2
•  No more JobTrackers, TaskTrackers
•  YARN ~ Operating System for Clusters
o  MapReduce is implemented as a YARN application
o  Bring on the applications! (Spark is just the start…)
•  Should be Transparent to Hive users

HADOOP 2 DEBUGGING TOOLS
•  Monitoring
o  System state of cluster:
§  CPU, Memory, Network, Disk
§  Nagios, Ganglia, Sensu!
§  Collectd, statd, Graphite
o  Hadoop level
§  HDFS usage
§  Resource usage:
•  Container memory allocated vs used
•  # of jobs running at the same time
•  Long running tasks

HADOOP 2 DEBUGGING TOOLS
•  Hadoop logs
o  Daemon logs: Resource Manager, NameNode, DataNode
o  Application logs: Application Master, MapReduce tasks
o  Job history file: resources allocated during job lifetime
o  Application configuration files: store all Hadoop application
parameters
•  Source code instrumentation

ACCESSING LOGS IN HADOOP 2
•  To view the logs for a job, click on the link under the ID
column in Resource Manager UI.

•  To view application top level logs, click on logs.
•  To view individual logs for the mappers and reducers,
click on History.

•  Log output for the entire application.

•  Click on the Map link for mapper logs and the Reduce
link for reducer logs.

•  Clicking on a single link under Name provides an
overview for that particular map job.

•  Finally, clicking on the logs link will take you to the log
output for that map job.

•  Fun, fun, donuts, and more fun…

HIVE + HADOOP 2 ARCHITECTURE
•  Hive 0.10+

Hadoop
2
Cluster

Hive
CLI
Hive

Metastore

Hiveserver2
JDBC/ODBC

Tableau,

KeFle,
…

HIVE LOGS
•  Query Log location
•  From /etc/hive/hive-site.xml:
 
<property>"
<name>hive.querylog.location</name>"
<value>/home/hive/log/${user.name}</value>"
</property>"
"
SessionStart SESSION_ID="soam_201402032341"
TIME="1391470900594""
"

HIVE CLIENT LOGS
•  /etc/hive/hive-log4j.properties:
o  hive.log.dir=/var/log/hive/${user.name}
2014-05-29 19:51:09,830 INFO parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing
command: select count(*) from dogfood_job_data"
2014-05-29 19:51:09,852 INFO parse.ParseDriver (ParseDriver.java:parse(197)) - Parse
Completed"
2014-05-29 19:51:09,852 INFO ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG
method=parse start=1401393069830 end=1401393069852 duration=22>"
2014-05-29 19:51:09,853 INFO ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG
method=semanticAnalyze>"
2014-05-29 19:51:09,890 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:analyzeInternal(8305)) - Starting Semantic Analysis"
(SemanticAnalyzer.java:analyzeInternal(8340)) - Completed phase 1 of Semantic Analysis"
(SemanticAnalyzer.java:getMetaData(1060)) - Get metadata for source tables"
(SemanticAnalyzer.java:getMetaData(1167)) - Get metadata for subqueries"
(SemanticAnalyzer.java:getMetaData(1187)) - Get metadata for destination tables"
"

HIVE METASTORE LOGS
•  /etc/hive-metastore/hive-log4j.properties:
o  hive.log.dir=/service/log/hive-metastore/${user.name}
2014-05-29 19:50:50,179 INFO metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94
get_table : db=default tbl=dogfood_job_data"
2014-05-29 19:50:50,180 INFO HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94
cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data "
2014-05-29 19:50:50,236 INFO HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94
cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data "

HIVE ISSUES + CASE STUDIES
•  Hive Issues
o  Hive client out of memory
o  Hive map/reduce task out of memory
o  Hive metastore out of memory
o  Hive launches too many tasks
•  Case Studies:
o  Hive “stuck” job
o  Hive “missing directories”
o  Analyze Hive Query Execution
o  Hive + Interactive (DRAM Centric) Processing Engines

HIVE CLIENT OUT OF MEMORY
•  Memory intensive client side hive query (map-side join)
Number of reduce tasks not specified. Estimated from input data size: 999"
In order to change the average load for a reducer (in bytes):"
set hive.exec.reducers.bytes.per.reducer=<number>"
In order to limit the maximum number of reducers:"
set hive.exec.reducers.max=<number>"
In order to set a constant number of reducers:"
set mapred.reduce.tasks=<number>"
java.lang.OutOfMemoryError: Java heap space!
at java.nio.CharBuffer.wrap(CharBuffer.java:350)"
at java.nio.CharBuffer.wrap(CharBuffer.java:373)"
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:
138)"

HIVE CLIENT OUT OF MEMORY
•  Use HADOOP_HEAPSIZE prior to launching Hive client
•  HADOOP_HEAPSIZE=<new heapsize> hive <fileName>"
•  Watch out for HADOOP_CLIENT_OPTS issue in hive-env.sh!
•  Important to know the amount of memory available on
machine running client… Do not exceed or use
disproportionate amount.
$ free -m"
total used free shared buffers cached"
Mem: 1695 1388 306 0 60 424"
-/+ buffers/cache: 903 791"
Swap: 895 101 794"

HIVE TASK OUT OF MEMORY
•  Query spawns MapReduce jobs that run out of memory
•  How to find this issue?
o  Hive diagnostic message
o  Hadoop MapReduce logs

HIVE TASK OUT OF MEMORY
•  Fix is to increase task RAM allocation…
set mapreduce.map.memory.mb=<new RAM allocation>; "
set mapreduce.reduce.memory.mb=<new RAM allocation>;"
•  Also watch out for…
set mapreduce.map.java.opts=-Xmx<heap size>m; "
set mapreduce.reduce.java.opts=-Xmx<heap size>m; "
•  Not a magic bullet – requires manual tuning
•  Increase in individual container memory size:
o  Decrease in overall containers that can be run
o  Decrease in overall parallelism

HIVE METASTORE OUT OF MEMORY
•  Out of memory issues not necessarily dumped to logs
•  Metastore can become unresponsive
•  Can’t submit queries
•  Restart with a higher heap size:
export HADOOP_HEAPSIZE in hcat_server.sh
•  After notifying hive users about downtime:
service hcat restart"

HIVE LAUNCHES TOO MANY TASKS
•  Typically a function of the input data set
•  Lots of little files

HIVE LAUNCHES TOO MANY TASKS
•  Set mapred.max.split.size to appropriate fraction of data size
•  Also verify that
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat"

CASE STUDY: HIVE STUCK JOB
From an Altiscale customer:
“This job [jobid] has been running now for
41 hours. Is it still progressing or has
something hung up the map/reduce so it’s
just spinning? Do you have any insight?”

HIVE STUCK JOB
1.  Received jobId,
application_1382973574141_4536, from client
2.  Logged into client cluster.
3.  Pulled up Resource Manager
4.  Entered part of jobId (4536) in the search box.
5.  Clicked on the link that says:
application_1382973574141_4536"
6.  On resulting Application Overview page, clicked on link
next to “Tracking URL” that said Application Master

HIVE STUCK JOB
7.  On resulting MapReduce Application page, we clicked on the
Job Id (job_1382973574141_4536).
8.  The resulting MapReduce Job page displayed detailed status
of the mappers, including 4 failed mappers
9.  We then clicked on the 4 link on the Maps row in the Failed
column.
10. Title of the next page was “FAILED Map attempts in
job_1382973574141_4536.”
11.  Each failed mapper generated an error message.
12. Buried in the 16th line:
Caused by: java.io.FileNotFoundException: File
does not exist: hdfs://opaque_hostname:8020/
HiveTableDir/FileName.log.date.seq !

HIVE STUCK JOB
•  Job was stuck for a day or so, retrying a mapper that
would never finish successfully.
•  During the job, our customers’ colleague realized input
file was corrupted and deleted it.
•  Colleague did not anticipate the affect of removing
corrupted data on a running job
•  Hadoop didn’t make it easy to find out:
o  RM => search => application link => AM overview page => MR
Application Page => MR Job Page => Failed jobs page =>
parse long logs
o  Task retry without hope of success

HIVE “MISSING DIRECTORIES”
From an Altiscale customer:
“One problem we are seeing after the
[Hive Metastore] restart is that we lost
quite a few directories in [HDFS]. Is there
a way to recover these?”

•  Obtained list of “missing” directories from customer:
o  /hive/biz/prod/*
•  Confirmed they were missing from HDFS
•  Searched through NameNode audit log to get block IDs that
belonged to missing directories.
13/07/24 21:10:08 INFO hdfs.StateChange: BLOCK*
NameSystem.allocateBlock: /hive/biz/prod/
incremental/carryoverstore/postdepuis/
lmt_unmapped_pggroup_schema._COPYING_.
BP-798113632-10.251.255.251-1370812162472
blk_3560522076897293424_2448396{blockUCState=UNDER_C
ONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[10.251.255.177:50
010|RBW],
ReplicaUnderConstruction[10.251.255.174:50010|RBW],
ReplicaUnderConstruction[10.251.255.169:50010|RBW]]}"

•  Used blockID to locate exact time of file deletion from
Namenode logs:
13/07/31 08:10:33 INFO hdfs.StateChange:
BLOCK* addToInvalidates:
blk_3560522076897293424_2448396 to
10.251.255.177:50010 10.251.255.169:50010
10.251.255.174:50010 "
•  Used time of deletion to inspect hive logs

QueryStart QUERY_STRING="create database biz_weekly location '/hive/biz/
prod'" QUERY_ID=“usrprod_20130731043232_0a40fd32-8c8a-479c-
ba7d-3bd8a2698f4b" TIME="1375245164667"
:
QueryEnd QUERY_STRING="create database biz_weekly location '/hive/biz/
prod'" QUERY_ID=”usrprod_20130731043232_0a40fd32-8c8a-479c-
ba7d-3bd8a2698f4b" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0"
TIME="1375245166203"
:
QueryStart QUERY_STRING="drop database biz_weekly"
QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733"
TIME="1375256014799"
:
QueryEnd QUERY_STRING="drop database biz_weekly"
QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733"
QUERY_NUM_TASKS="0" TIME="1375256014838"

•  In effect, user “usrprod” issued:
At 2013-07-31 04:32:44: create database biz_weekly
location '/hive/biz/prod'
At 2013-07-31 07:33:24: drop database biz_weekly
•  This is functionally equivalent to:
hdfs dfs -rm -r /hive/biz/prod"

•  Customer manually placed their own data in /hive –
the warehouse directory managed and controlled by hive
•  Customer used CREATE and DROP db commands in
their code
o  Hive deletes database and table locations in /hive with
impunity
•  Why didn’t deleted data end up in .Trash?
o  Trash collection not turned on in configuration settings
o  It is now, but need a –skipTrash option (HIVE-6469)

•  Hadoop forensics: piece together disparate sources…
o  Hadoop daemon logs (NameNode)
o  Hive query and metastore logs
o  Hadoop config files
•  Need better tools to correlate the different layers of the
system: hive client, hive metastore, MapReduce job,
YARN, HDFS, operating sytem metrics, …
By the way… Operating any distributed system would be
totally insane without NTP and a standard time zone (UTC).

CASE STUDY – ANALYZE QUERY
•  Customer provided Hive query + data sets
(100GBs to ~5 TBs)
•  Needed help optimizing the query
•  Didn’t rewrite query immediately
•  Wanted to characterize query performance and isolate
bottlenecks first

ANALYZE AND TUNE EXECUTION
•  Ran original query on the datasets in our environment:
o  Two M/R Stages: Stage-1, Stage-2
•  Long running reducers run out of memory
o  set mapreduce.reduce.memory.mb=5120"
o  Reduces slots and extends reduce time
•  Query fails to launch Stage-2 with out of memory
o  set HADOOP_HEAPSIZE=1024 on client machine
•  Query has 250,000 Mappers in Stage-2 which causes
failure
o  set mapred.max.split.size=5368709120 
to reduce Mappers

ANALYSIS: HOW TO VISUALIZE?
•  Next challenge - how to visualize job execution?
•  Existing hadoop/hive logs not sufficient for this task
•  Wrote internal tools
o  parse job history files
o  plot mapper and reducer execution

Single
reduce
task

ANALYSIS: REDUCE STAGE-1

ANALYZE EXECUTION: FINDINGS
•  Lone, long running reducer in first stage of query
•  Analyzed input data:
o  Query split input data by userId
o  Bucketizing input data by userId
o  One very large bucket: “invalid” userId
o  Discussed “invalid” userid with customer
•  An error value is a common pattern!
o  Need to differentiate between “Don’t know and don’t care”
or “don’t know and do care.”

INTERACTIVE (DRAM CENTRIC)
PROCESSING SYSTEMS
•  Loading data into DRAM makes processing fast!
•  Examples: Spark, Impala, 0xdata, …, [SAP HANA], …
•  Streaming systems (Storm, DataTorrent) may be similar
•  Need to increase YARN container memory size

•  Caution: larger YARN container settings for interactive
jobs may not be right for batch systems like Hive
•  Container size: needs to combine vcores and memory:
yarn.scheduler.maximum-allocation-vcores 
yarn.nodemanager.resource.cpu-vcores ..."
Hive + Interactive: Watch Out for Container Size

HIVE + INTERACTIVE:
WATCH OUT FOR FRAGMENTATION
•  Attempting to schedule interactive systems and batch
systems like Hive may result in fragmentation
•  Interactive systems may require all-or-nothing scheduling
•  Batch jobs with little tasks may starve interactive jobs

HIVE + INTERACTIVE:
WATCH OUT FOR FRAGMENTATION
Solutions for fragmentation…
•  Reserve interactive nodes before starting batch jobs
•  Reduce interactive container size (if the algorithm permits)
•  Node labels (YARN-2492) and gang scheduling (YARN-624)

CONCLUSIONS
•  Hive + Hadoop debugging can get very complex
o  Sifting through many logs and screens
o  Automatic transmission versus manual transmission
•  Static partitioning induced by Java Virtual Machine has
benefits but also induces challenges.
•  Where there are difficulties, there’s opportunity:
o  Better tooling, instrumentation, integration of logs/metrics
•  YARN still evolving into an operating system
•  Hadoop as a Service: aggregate and share expertise
•  Need to learn from the traditional database community!

Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale

Similar to Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale