A brief introduction to YARN: how and why it came into existence and how it fits together with this thing called Hadoop.
Focus given to architecture, availability, resource management and scheduling, migration from MR1 to MR2, job history and logging, interfaces, and applications.
48. 48
YARN
Scheduling
ResourceManager
ApplicaAon
Master
1
ApplicaAon
Master
2
Here’s
a
security
token
to
let
you
launch
a
container
on
Node
1
Node
1
Node
2
Node
3
49. 49
YARN
Scheduling
ResourceManager
ApplicaAon
Master
1
ApplicaAon
Master
2
Hey,
launch
my
container
with
this
shell
command
Node
1
Node
2
Node
3
52. 52
FairScheduler
(FS)
• When
space
becomes
available
to
run
a
task
on
the
cluster,
which
applicaAon
do
we
give
it
to?
• Find
the
job
that
is
using
the
least
space.
53. 53
FS:
Apps
and
Queues
• Apps
go
in
“queues”
• Share
fairly
between
queues
• Share
fairly
between
apps
within
queues
55. 55
FS: Fast and Slow Lanes
Root
Mem
Capacity:
12
GB
CPU
Capacity:
24
cores
MarkeJng
Fair
Share
Mem:
4
GB
Fair
Share
CPU:
8
cores
Sales
Fair
Share
Mem:
4
GB
Fair
Share
CPU:
8
cores
Fast
Lane
Max
Share
Mem:
1
GB
Max
Share
CPU:
1
cores
Slow
Lane
Fair
Share
Mem:
3
GB
Fair
Share
CPU:
7
cores
56. 56
FS:
Fairness
for
Hierarchies
• Traverse
the
tree
starAng
at
the
root
queue
• Offer
resources
to
subqueues
in
order
of
how
few
resources
they’re
using
66. 66
Job
History
• Job
History
Viewing
was
moved
to
its
own
server:
Job
History
Server
• Helps
with
load
on
RM
(JT
equivalent)
• Helps
separate
MR
from
YARN
67. 67
How
History
Flows?
• AM
• While
running,
keeps
track
of
all
events
during
execuAon
• On
success,
before
finishing
up
• Writes
the
history
informaAon
to
done_intermediate_dir
• The
JHS
• periodically
scans
the
done_intermediate
dir
• moves
the
files
to
done_dir
• starts
showing
the
history
68. 68
History:
Important
ConfiguraAon
ProperAes
• yarn.app.mapreduce.am.staging-dir
• Default
(CM):
/user
←
Want
this
also
for
security
• Default
(CDH):
/tmp/hadoop-‐yarn/staging
• Staging
directory
for
MapReduce
applicaAons
• mapreduce.jobhistory.done-dir
• Default:
${yarn.app.mapreduce.am.staging-‐dir}/history/done
• Final
locaAon
in
HDFS
for
history
files
• mapreduce.jobhistory.intermediate-done-dir
• Default:
${yarn.app.mapreduce.am.staging-‐dir}/history/done_intermediate
• LocaAon
in
HDFS
where
AMs
dump
history
files
69. 69
History:
Important
ConfiguraAon
ProperAes
• mapreduce.jobhistory.max-age-ms
• Default
604800000
(7
days)
• Max
age
before
JHS
deletes
history
• mapreduce.jobhistory.move.interval-ms
• Default:
180000
(3
min)
• Frequency
at
which
JHS
scans
the
intermediate_done
dir
70. 70
History:
Miscellaneous
• The
JHS
runs
as
‘mapred’,
the
AM
run
as
the
user
who
submi[ed
the
job,
and
the
RM
runs
as
‘yarn’
• The
done-‐intermediate
dir
needs
to
be
writable
by
the
user
who
submi[ed
the
job
and
readable
by
‘mapred’
• The
RM,
AM,
and
JHS
should
have
idenAcal
versions
of
the
jobhistory-‐related
properAes
so
they
all
“agree”