SlideShare ist ein Scribd-Unternehmen logo
1 von 86
Downloaden Sie, um offline zu lesen
1 
YARN 
Alex 
Moundalexis 
@technmsg
CC 
BY 
2.0 
/ 
Richard 
Bumgardner 
Been 
there, 
done 
that.
3 
Alex 
@ 
Cloudera 
• SoluAons 
Architect 
• AKA 
consultant 
• government 
• Infrastructure 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
4 
What 
Does 
Cloudera 
Do? 
• product 
• distribuAon 
of 
Hadoop 
components, 
Apache 
licensed 
• enterprise 
tooling 
• support 
• training 
• services 
(aka 
consulAng) 
• community 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
5 
Disclaimer 
• Cloudera 
builds 
things 
soTware 
• most 
donated 
to 
Apache 
• some 
closed-­‐source 
• Cloudera 
“products” 
I 
reference 
are 
open 
source 
• Apache 
Licensed 
• source 
code 
is 
on 
GitHub 
• h[ps://github.com/cloudera 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
6 
What 
This 
Talk 
Isn’t 
About 
• deploying 
• Puppet, 
Chef, 
Ansible, 
homegrown 
scripts, 
intern 
labor 
• sizing 
& 
tuning 
• depends 
heavily 
on 
data 
and 
workload 
• coding 
• line 
diagrams 
don’t 
count 
• algorithms 
• I 
suck 
at 
math, 
ask 
anyone 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
7 
So 
What 
ARE 
We 
Talking 
About? 
• Why 
YARN? 
• Architecture 
• Availability 
• Resources 
& 
Scheduling 
• MR1 
to 
MR2 
Gotchas 
• History 
• Interfaces 
• ApplicaAons 
• StoryAme 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
YARN
9 
Why 
“Ecosystem?” 
• In 
the 
beginning, 
just 
Hadoop 
• HDFS 
• MapReduce 
• Today, 
dozens 
of 
interrelated 
components 
• I/O 
• Processing 
• Specialty 
ApplicaAons 
• ConfiguraAon 
• Workflow 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
10 
ParAal 
Ecosystem 
Hadoop 
external 
system 
web 
server 
device 
logs 
RDBMS 
/ 
DWH 
API 
access 
log 
collecAon 
DB 
table 
import 
batch 
processing 
machine 
learning 
external 
system 
API 
access 
user 
RDBMS 
/ 
DWH 
BI 
tool 
+ 
JDBC/ODBC 
SQL 
Search 
DB 
table 
export 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
11 
HDFS 
• Distributed, 
highly 
fault-­‐tolerant 
filesystem 
• OpAmized 
for 
large 
streaming 
access 
to 
data 
• Based 
on 
Google 
File 
System 
• h[p://research.google.com/archive/gfs.html 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
12 
Lots 
of 
Commodity 
Machines 
Image:Yahoo! Hadoop cluster [ OSCON ’07 ] 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
13 
MapReduce 
(MR) 
• Programming 
paradigm 
• Batch 
oriented, 
not 
realAme 
• Works 
well 
with 
distributed 
compuAng 
• Lots 
of 
Java, 
but 
other 
languages 
supported 
• Based 
on 
Google’s 
paper 
• h[p://research.google.com/archive/mapreduce.html 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
14 
MR1 
Components 
• JobTracker 
• accepts 
jobs 
from 
client 
• schedules 
jobs 
on 
parAcular 
nodes 
• accepts 
status 
data 
from 
TaskTrackers 
• TaskTracker 
• one 
per-­‐node 
• manages 
tasks 
• crunches 
data 
in-­‐place 
• reports 
to 
JobTracker 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
15 
Under 
the 
Covers 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
16 
You specify map() and 
reduce() functions. 
 
The framework does the 
rest. 
60 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
But 
wait… 
WHY 
DO 
WE 
NEED 
THIS?
1818
YARN
20 
YARN 
Yet 
Another 
Ridiculous 
Name
21 
YARN 
Yet 
Another 
Ridiculous 
Name
22 
YARN 
Yet 
Another 
Resource 
NegoAator
23 
Why 
YARN 
/ 
MR2? 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Scalability 
• JT 
kept 
track 
of 
individual 
tasks 
and 
wouldn’t 
scale 
• UAlizaAon 
• All 
slots 
are 
equal 
even 
if 
the 
work 
is 
not 
equal 
• MulA-­‐tenancy 
• Every 
framework 
shouldn’t 
need 
to 
write 
its 
own 
execuAon 
engine 
• All 
frameworks 
should 
share 
the 
resources 
on 
a 
cluster
24 
An 
OperaAng 
System? 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
TradiAonal 
OperaAng 
System 
Storage: 
File 
System 
ExecuAon/ 
Scheduling: 
Processes/ 
Kernel 
Scheduler 
Hadoop 
Storage: 
Hadoop 
Distributed 
File 
System 
(HDFS) 
ExecuAon/ 
Scheduling: 
Yet 
Another 
Resource 
NegoJaJor 
(YARN)
25 
MulAple 
levels 
of 
scheduling 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• YARN 
• Which 
applicaAon 
(framework) 
to 
give 
resources 
to? 
• ApplicaAon 
(Framework 
-­‐ 
MR 
etc.) 
• Which 
task 
within 
the 
applicaAon 
should 
use 
these 
resources?
YARN
27 
Architecture 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
28 
Architecture 
– 
running 
mulAple 
applicaAons 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
29 
Control 
Flow: 
Submit 
applicaAon 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
30 
Control 
Flow: 
Get 
applicaAon 
updates 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
31 
Control 
Flow: 
AM 
asking 
for 
resources 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
32 
Control 
Flow: 
AM 
using 
containers 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
33 
ExecuAon 
Modes 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Local 
mode 
• Uber 
mode 
• Executors 
• DefaultContainerExecutor 
• LinuxContainerExecutor
YARN
35 
Client 
Failover 
Client 
Failover 
Availability 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
RM 
Ele 
ctor 
RM 
Ele 
ctor 
ZK 
Store 
NM 
NM 
NM 
NM 
Client 
Client 
Client
36 
Availability 
– 
SubtleAes 
• Embedded 
leader 
elector 
• No 
need 
for 
a 
separate 
daemon 
like 
ZKFC 
• Implicit 
fencing 
using 
ZKRMStateStore 
• AcAve 
RM 
claims 
exclusive 
access 
to 
store 
through 
ACL 
magic 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
37 
Availability 
– 
ImplicaAons 
• Previously 
submi[ed 
applicaAons 
conAnue 
to 
run 
• New 
ApplicaAon 
Masters 
are 
created 
• If 
the 
AM 
checkpoints 
state, 
can 
conAnue 
from 
where 
it 
leT 
• MR 
keeps 
track 
of 
completed 
tasks. 
They 
don’t 
have 
to 
be 
re-­‐run 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Future 
• Work-­‐preserving 
RM 
Restart 
/ 
Failover
38 
Availability 
– 
ImplicaAons 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Transparent 
to 
clients 
• RM 
unavailable 
for 
a 
small 
duraAon 
• AutomaAcally 
failover 
to 
the 
AcAve 
RM 
• Web 
UI 
redirects 
• REST 
API 
redirects 
(starAng 
5.1.0)
YARN
40 
Resource 
Model 
and 
CapaciAes 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Resource 
vectors 
• e.g. 
1024 
MB, 
2 
vcores, 
… 
• No 
more 
task 
slots! 
• Nodes 
specify 
the 
amount 
of 
resources 
they 
have 
• yarn.nodemanager.resource.memory-­‐mb 
• yarn.nodemanager.resource.cpu-­‐vcores 
• vcores 
to 
cores 
relaAon, 
not 
really 
“virtual”
41 
Resources 
and 
Scheduling 
• What 
you 
request 
is 
what 
you 
get 
• No 
more 
fixed-­‐size 
slots 
• Framework/applicaAon 
requests 
resources 
for 
a 
task 
• MR 
AM 
requests 
resources 
for 
map 
and 
reduce 
tasks, 
these 
requests 
can 
potenAally 
be 
for 
different 
amounts 
of 
resources 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
42 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
Node 
1 
Node 
2 
Node 
3
43 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
I 
want 
2 
containers 
with 
1024 
MB 
and 
a 
1 
core 
each 
Node 
1 
Node 
2 
Node 
3
44 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
Noted 
Node 
1 
Node 
2 
Node 
3
45 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
I’m 
sAll 
here 
Node 
1 
Node 
2 
Node 
3
46 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
I’ll 
reserve 
some 
space 
on 
node1 
for 
AM1 
Node 
1 
Node 
2 
Node 
3
47 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
Got 
anything 
for 
me? 
Node 
1 
Node 
2 
Node 
3
48 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
Here’s 
a 
security 
token 
to 
let 
you 
launch 
a 
container 
on 
Node 
1 
Node 
1 
Node 
2 
Node 
3
49 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
Hey, 
launch 
my 
container 
with 
this 
shell 
command 
Node 
1 
Node 
2 
Node 
3
50 
YARN 
Scheduling 
ResourceManager 
ApplicaAon 
Master 
1 
ApplicaAon 
Master 
2 
Node 
1 
Node 
2 
Node 
3 
Container
51 
Resources 
on 
a 
Node 
5 
GB 
Map 
512 
MB 
Reduce 
1536 
MB 
Map 
1024 
MB 
Map 
256 
MB 
Map 
256 
MB 
Reduce 
512 
MB 
MR 
-­‐ 
AM 
1024 
MB
52 
FairScheduler 
(FS) 
• When 
space 
becomes 
available 
to 
run 
a 
task 
on 
the 
cluster, 
which 
applicaAon 
do 
we 
give 
it 
to? 
• Find 
the 
job 
that 
is 
using 
the 
least 
space.
53 
FS: 
Apps 
and 
Queues 
• Apps 
go 
in 
“queues” 
• Share 
fairly 
between 
queues 
• Share 
fairly 
between 
apps 
within 
queues
54 
FS: Hierarchical Queues 
Root 
Mem 
Capacity: 
12 
GB 
CPU 
Capacity: 
24 
cores 
MarkeJng 
Fair 
Share 
Mem: 
4 
GB 
Fair 
Share 
CPU: 
8 
cores 
RD 
Fair 
Share 
Mem: 
4 
GB 
Fair 
Share 
CPU: 
8 
cores 
Sales 
Fair 
Share 
Mem: 
4 
GB 
Fair 
Share 
CPU: 
8 
cores 
Jim’s 
Team 
Fair 
Share 
Mem: 
2 
GB 
Fair 
Share 
CPU: 
4 
cores 
Bob’s 
Team 
Fair 
Share 
Mem: 
2 
GB 
Fair 
Share 
CPU: 
4 
cores
55 
FS: Fast and Slow Lanes 
Root 
Mem 
Capacity: 
12 
GB 
CPU 
Capacity: 
24 
cores 
MarkeJng 
Fair 
Share 
Mem: 
4 
GB 
Fair 
Share 
CPU: 
8 
cores 
Sales 
Fair 
Share 
Mem: 
4 
GB 
Fair 
Share 
CPU: 
8 
cores 
Fast 
Lane 
Max 
Share 
Mem: 
1 
GB 
Max 
Share 
CPU: 
1 
cores 
Slow 
Lane 
Fair 
Share 
Mem: 
3 
GB 
Fair 
Share 
CPU: 
7 
cores
56 
FS: 
Fairness 
for 
Hierarchies 
• Traverse 
the 
tree 
starAng 
at 
the 
root 
queue 
• Offer 
resources 
to 
subqueues 
in 
order 
of 
how 
few 
resources 
they’re 
using
57 
FS: Hierarchical Queues 
Root 
MarkeJng 
RD 
Sales 
Jim’s 
Team 
Bob’s 
Team
58 
FS: 
MulA-­‐resource 
scheduling 
• Scheduling 
based 
on 
mulAple 
resources 
• CPU, 
memory 
• Future: 
Disk, 
Network 
• Why 
mulAple 
resources? 
• Be[er 
uAlizaAon 
• More 
fair
59 
FS: 
More 
features 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• PreempAon 
• To 
avoid 
starvaAon, 
preempt 
tasks 
using 
more 
than 
their 
fairshare 
aTer 
the 
preempAon 
Ameout 
• Warn 
applicaAons. 
ApplicaAon 
can 
choose 
to 
kill 
any 
of 
its 
containers 
• Locality 
through 
delay 
scheduling 
• Try 
to 
give 
node-­‐local, 
rack-­‐local 
resources 
by 
waiAng 
for 
someAme
60 
Enforcing 
resource 
limits 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Memory 
• Monitor 
process 
usage 
and 
kill 
if 
crosses 
• Disable 
virtual 
memory 
checking 
• Physical 
memory 
checking 
is 
being 
improved 
• CPU 
• Cgroups
661 
1 
MicrosoT 
Office 
EULA. 
Really.
62 
MR1 
to 
MR2 
Gotchas 
• AMs 
can 
take 
up 
all 
resources 
• Symptom: 
Submi[ed 
jobs 
don’t 
run 
• Fix 
in 
progress 
-­‐ 
to 
limit 
number 
of 
max 
applicaAons 
• Work 
around 
– 
scheduler 
allocaAons 
to 
limit 
number 
of 
applicaAons 
• How 
to 
run 
4 
maps 
and 
2 
reduces 
per 
node? 
• Don’t 
try 
to 
tune 
number 
of 
tasks 
per 
node 
• Set 
assignMulAple 
to 
false 
to 
spread 
allocaAons 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
63 
MR1 
to 
MR2 
Gotchas 
• Comparing 
MR1 
and 
MR2 
benchmarks 
• TestDFSIO 
runs 
best 
on 
dedicated 
CPU/disk, 
harder 
to 
pin 
• TeraSort 
changed: 
less 
compressible 
== 
more 
network 
xfer 
• Resource 
AllocaAon 
vs 
Resource 
ConsumpAon 
• RM 
allocates 
resources, 
heap 
specified 
elsewhere 
• JVM 
overhead 
not 
included 
• Mind 
your 
mapred.[map|reduce].child.java.opts 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
64 
MR1 
to 
MR2 
Gotchas 
• Changes 
in 
logs, 
tracing 
problems 
harder 
• MR1: 
distributed 
grep 
on 
JobId 
• YARN 
logs 
more 
generic, 
deal 
with 
containers 
not 
apps 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
YARN
66 
Job 
History 
• Job 
History 
Viewing 
was 
moved 
to 
its 
own 
server: 
Job 
History 
Server 
• Helps 
with 
load 
on 
RM 
(JT 
equivalent) 
• Helps 
separate 
MR 
from 
YARN
67 
How 
History 
Flows? 
• AM 
• While 
running, 
keeps 
track 
of 
all 
events 
during 
execuAon 
• On 
success, 
before 
finishing 
up 
• Writes 
the 
history 
informaAon 
to 
done_intermediate_dir 
• The 
JHS 
• periodically 
scans 
the 
done_intermediate 
dir 
• moves 
the 
files 
to 
done_dir 
• starts 
showing 
the 
history
68 
History: 
Important 
ConfiguraAon 
ProperAes 
• yarn.app.mapreduce.am.staging-dir 
• Default 
(CM): 
/user 
← 
Want 
this 
also 
for 
security 
• Default 
(CDH): 
/tmp/hadoop-­‐yarn/staging 
• Staging 
directory 
for 
MapReduce 
applicaAons 
• mapreduce.jobhistory.done-dir 
• Default: 
${yarn.app.mapreduce.am.staging-­‐dir}/history/done 
• Final 
locaAon 
in 
HDFS 
for 
history 
files 
• mapreduce.jobhistory.intermediate-done-dir 
• Default: 
${yarn.app.mapreduce.am.staging-­‐dir}/history/done_intermediate 
• LocaAon 
in 
HDFS 
where 
AMs 
dump 
history 
files
69 
History: 
Important 
ConfiguraAon 
ProperAes 
• mapreduce.jobhistory.max-age-ms 
• Default 
604800000 
(7 
days) 
• Max 
age 
before 
JHS 
deletes 
history 
• mapreduce.jobhistory.move.interval-ms 
• Default: 
180000 
(3 
min) 
• Frequency 
at 
which 
JHS 
scans 
the 
intermediate_done 
dir
70 
History: 
Miscellaneous 
• The 
JHS 
runs 
as 
‘mapred’, 
the 
AM 
run 
as 
the 
user 
who 
submi[ed 
the 
job, 
and 
the 
RM 
runs 
as 
‘yarn’ 
• The 
done-­‐intermediate 
dir 
needs 
to 
be 
writable 
by 
the 
user 
who 
submi[ed 
the 
job 
and 
readable 
by 
‘mapred’ 
• The 
RM, 
AM, 
and 
JHS 
should 
have 
idenAcal 
versions 
of 
the 
jobhistory-­‐related 
properAes 
so 
they 
all 
“agree”
71 
ApplicaAon 
History 
Server 
/ 
Timeline 
Server 
• Work 
in 
progress 
to 
capture 
history 
and 
other 
informaAon 
for 
non-­‐MR 
YARN 
applicaAons 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
72 
YARN 
Container 
Logs 
• While 
applicaAon 
is 
running 
• Local 
to 
the 
NM. 
yarn.nodemanager.log-­‐dirs 
• ATer 
applicaAon 
finishes 
• Logs 
aggregated 
to 
HDFS 
• yarn.nodemanager.remote-­‐app-­‐log-­‐dir 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Disable 
aggregaAon? 
• yarn.log-­‐aggregaAon-­‐enable
YARN
YARN
75 
InteracAng 
with 
a 
YARN 
cluster 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Java 
API 
• MR1 
– 
MR2 
APIs 
are 
compaAble 
• REST 
API 
• RM, 
NM, 
JHS 
– 
all 
have 
REST 
APIs 
that 
are 
very 
useful 
• Llama 
(Long-­‐Lived 
ApplicaAon 
Master) 
• Cloudera 
Impala 
can 
reserve, 
use, 
and 
release 
resource 
allocaAons 
without 
using 
YARN-­‐managed 
container 
processes 
• CLI 
• yarn 
rmadmin, 
applicaAon, 
etc. 
• Web 
UI 
• New 
and 
“improved” 
– 
need 
Ame 
to 
get 
used 
to
YARN
77 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
YARN 
ApplicaAons 
• MR2 
• Cloudera 
Impala 
• Apache 
Spark 
• Others? 
Custom? 
• Apache 
Slider 
(incubaAng); 
not 
producAon-­‐ready 
• Accumulo 
• HBase 
• Storm
YARN
79 
The 
Cloudera 
View 
of 
YARN 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Shipping 
• Enabled 
by 
default 
on 
CDH5+ 
• Included 
for 
past 
two 
years, 
not 
enabled 
• Supported 
• Recommended
80 
• Benchmarking 
is 
harder 
• different 
uAlizaAon 
paradigm 
• “whole 
cluster” 
benchmarks 
more 
important, 
e.g. 
SWIM 
• Tuning 
sAll 
largely 
trial/error 
• MR1 
was 
the 
same 
originally 
• YARN/MR2 
will 
get 
there 
eventually 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
Growing 
Pains
81 
What 
Are 
Customers 
Doing? 
• A 
few 
are 
using 
in 
producAon 
• Many 
are 
exploring 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
• Spark 
• Impala 
via 
Llama 
• Most 
are 
waiAng
82 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
Why 
not 
Mesos? 
• Mesos 
• designed 
to 
be 
completely 
general 
purpose 
• more 
burden 
on 
app 
developer 
(offer 
model 
vs 
app 
request) 
• YARN 
• designed 
with 
Hadoop 
in 
mind 
• supports 
Kerberos 
• more 
robust/familiar 
scheduling 
• rack/machine 
locality, 
out 
of 
box 
• Supportability 
• all 
commercial 
Hadoop 
vendors 
support 
YARN 
• support 
for 
Mesos 
limited 
to 
startup 
Mesosphere
83 
Is 
This 
the 
End 
for 
MapReduce? 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.
Extra 
special 
thanks: 
ALL 
OF 
YOU
85 
• CC 
BY 
2.0 
flik 
h[ps://flic.kr/p/4RVoUX 
• CC 
BY 
2.0 
Ian 
Sane 
h[ps://flic.kr/p/nRyHxd 
• CC 
BY-­‐NC 
2.0 
lollyknit 
h[ps://flic.kr/p/49C1Xi 
• CC 
BY-­‐ND 
2.0 
jankunst 
h[ps://flic.kr/p/deU71s 
• CC 
BY-­‐SA 
2.0 
pierrepocs 
h[ps://flic.kr/p/9mgdMd 
• CC 
BY-­‐SA 
2.0 
bekathwia 
h[ps://flic.kr/p/4FpABU 
• CC 
BY-­‐NC-­‐ND 
2.0 
digitalnc 
h[ps://flic.kr/p/dxyTt1 
• CC 
BY-­‐NC-­‐ND 
2.0 
arselectronica 
h[ps://flic.kr/p/7yw8z2 
• CC 
BY-­‐NC-­‐ND 
2.0 
yum9me 
h[ps://flic.kr/p/81hQ49 
• CC 
BY-­‐NC-­‐SA 
2.0 
jimnix 
h[ps://flic.kr/p/gsqpWC 
• MicrosoT 
Office 
EULA 
(really) 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved. 
Image 
Credits
86 
Thank 
You! 
Alex 
Moundalexis 
@technmsg 
Insert 
wi[y 
tagline 
here.

Weitere ähnliche Inhalte

Was ist angesagt?

Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionNeelesh Srinivas Salian
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 
24HOP Introduction to Linux for SQL Server DBAs
24HOP Introduction to Linux for SQL Server DBAs24HOP Introduction to Linux for SQL Server DBAs
24HOP Introduction to Linux for SQL Server DBAsKellyn Pot'Vin-Gorman
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInDataWorks Summit
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Use case for using the ElastiCache for Redis in production
Use case for using the ElastiCache for Redis in productionUse case for using the ElastiCache for Redis in production
Use case for using the ElastiCache for Redis in production知教 本間
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefMatt Ray
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
 

Was ist angesagt? (20)

Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 
24HOP Introduction to Linux for SQL Server DBAs
24HOP Introduction to Linux for SQL Server DBAs24HOP Introduction to Linux for SQL Server DBAs
24HOP Introduction to Linux for SQL Server DBAs
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Use case for using the ElastiCache for Redis in production
Use case for using the ElastiCache for Redis in productionUse case for using the ElastiCache for Redis in production
Use case for using the ElastiCache for Redis in production
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 

Ähnlich wie YARN

Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2DataWorks Summit
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkJeremy Beard
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
MySQL Fabric - High Availability & Automated Sharding for MySQL
MySQL Fabric - High Availability & Automated Sharding for MySQLMySQL Fabric - High Availability & Automated Sharding for MySQL
MySQL Fabric - High Availability & Automated Sharding for MySQLTed Wennmark
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformTsuyoshi OZAWA
 
Oracle Cloud DBaaS
Oracle Cloud DBaaSOracle Cloud DBaaS
Oracle Cloud DBaaSArush Jain
 

Ähnlich wie YARN (20)

Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
MySQL Fabric
MySQL FabricMySQL Fabric
MySQL Fabric
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
MySQL Fabric - High Availability & Automated Sharding for MySQL
MySQL Fabric - High Availability & Automated Sharding for MySQLMySQL Fabric - High Availability & Automated Sharding for MySQL
MySQL Fabric - High Availability & Automated Sharding for MySQL
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Oracle Cloud DBaaS
Oracle Cloud DBaaSOracle Cloud DBaaS
Oracle Cloud DBaaS
 

Mehr von Alex Moundalexis

Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Introduction to Cloudera Impala
Introduction to Cloudera ImpalaIntroduction to Cloudera Impala
Introduction to Cloudera ImpalaAlex Moundalexis
 
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldAlex Moundalexis
 

Mehr von Alex Moundalexis (6)

Powered by the Sun
Powered by the SunPowered by the Sun
Powered by the Sun
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Introduction to Cloudera Impala
Introduction to Cloudera ImpalaIntroduction to Cloudera Impala
Introduction to Cloudera Impala
 
Many Hats at Cloudera
Many Hats at ClouderaMany Hats at Cloudera
Many Hats at Cloudera
 
Hue Visual Tour
Hue Visual TourHue Visual Tour
Hue Visual Tour
 
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
 

Kürzlich hochgeladen

Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Kürzlich hochgeladen (20)

Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

YARN

  • 1. 1 YARN Alex Moundalexis @technmsg
  • 2. CC BY 2.0 / Richard Bumgardner Been there, done that.
  • 3. 3 Alex @ Cloudera • SoluAons Architect • AKA consultant • government • Infrastructure ©2014 Cloudera, Inc. All rights reserved.
  • 4. 4 What Does Cloudera Do? • product • distribuAon of Hadoop components, Apache licensed • enterprise tooling • support • training • services (aka consulAng) • community ©2014 Cloudera, Inc. All rights reserved.
  • 5. 5 Disclaimer • Cloudera builds things soTware • most donated to Apache • some closed-­‐source • Cloudera “products” I reference are open source • Apache Licensed • source code is on GitHub • h[ps://github.com/cloudera ©2014 Cloudera, Inc. All rights reserved.
  • 6. 6 What This Talk Isn’t About • deploying • Puppet, Chef, Ansible, homegrown scripts, intern labor • sizing & tuning • depends heavily on data and workload • coding • line diagrams don’t count • algorithms • I suck at math, ask anyone ©2014 Cloudera, Inc. All rights reserved.
  • 7. 7 So What ARE We Talking About? • Why YARN? • Architecture • Availability • Resources & Scheduling • MR1 to MR2 Gotchas • History • Interfaces • ApplicaAons • StoryAme ©2014 Cloudera, Inc. All rights reserved.
  • 9. 9 Why “Ecosystem?” • In the beginning, just Hadoop • HDFS • MapReduce • Today, dozens of interrelated components • I/O • Processing • Specialty ApplicaAons • ConfiguraAon • Workflow ©2014 Cloudera, Inc. All rights reserved.
  • 10. 10 ParAal Ecosystem Hadoop external system web server device logs RDBMS / DWH API access log collecAon DB table import batch processing machine learning external system API access user RDBMS / DWH BI tool + JDBC/ODBC SQL Search DB table export ©2014 Cloudera, Inc. All rights reserved.
  • 11. 11 HDFS • Distributed, highly fault-­‐tolerant filesystem • OpAmized for large streaming access to data • Based on Google File System • h[p://research.google.com/archive/gfs.html ©2014 Cloudera, Inc. All rights reserved.
  • 12. 12 Lots of Commodity Machines Image:Yahoo! Hadoop cluster [ OSCON ’07 ] ©2014 Cloudera, Inc. All rights reserved.
  • 13. 13 MapReduce (MR) • Programming paradigm • Batch oriented, not realAme • Works well with distributed compuAng • Lots of Java, but other languages supported • Based on Google’s paper • h[p://research.google.com/archive/mapreduce.html ©2014 Cloudera, Inc. All rights reserved.
  • 14. 14 MR1 Components • JobTracker • accepts jobs from client • schedules jobs on parAcular nodes • accepts status data from TaskTrackers • TaskTracker • one per-­‐node • manages tasks • crunches data in-­‐place • reports to JobTracker ©2014 Cloudera, Inc. All rights reserved.
  • 15. 15 Under the Covers ©2014 Cloudera, Inc. All rights reserved.
  • 16. 16 You specify map() and reduce() functions. The framework does the rest. 60 ©2014 Cloudera, Inc. All rights reserved.
  • 17. But wait… WHY DO WE NEED THIS?
  • 18. 1818
  • 20. 20 YARN Yet Another Ridiculous Name
  • 21. 21 YARN Yet Another Ridiculous Name
  • 22. 22 YARN Yet Another Resource NegoAator
  • 23. 23 Why YARN / MR2? ©2014 Cloudera, Inc. All rights reserved. • Scalability • JT kept track of individual tasks and wouldn’t scale • UAlizaAon • All slots are equal even if the work is not equal • MulA-­‐tenancy • Every framework shouldn’t need to write its own execuAon engine • All frameworks should share the resources on a cluster
  • 24. 24 An OperaAng System? ©2014 Cloudera, Inc. All rights reserved. TradiAonal OperaAng System Storage: File System ExecuAon/ Scheduling: Processes/ Kernel Scheduler Hadoop Storage: Hadoop Distributed File System (HDFS) ExecuAon/ Scheduling: Yet Another Resource NegoJaJor (YARN)
  • 25. 25 MulAple levels of scheduling ©2014 Cloudera, Inc. All rights reserved. • YARN • Which applicaAon (framework) to give resources to? • ApplicaAon (Framework -­‐ MR etc.) • Which task within the applicaAon should use these resources?
  • 27. 27 Architecture ©2014 Cloudera, Inc. All rights reserved.
  • 28. 28 Architecture – running mulAple applicaAons ©2014 Cloudera, Inc. All rights reserved.
  • 29. 29 Control Flow: Submit applicaAon ©2014 Cloudera, Inc. All rights reserved.
  • 30. 30 Control Flow: Get applicaAon updates ©2014 Cloudera, Inc. All rights reserved.
  • 31. 31 Control Flow: AM asking for resources ©2014 Cloudera, Inc. All rights reserved.
  • 32. 32 Control Flow: AM using containers ©2014 Cloudera, Inc. All rights reserved.
  • 33. 33 ExecuAon Modes ©2014 Cloudera, Inc. All rights reserved. • Local mode • Uber mode • Executors • DefaultContainerExecutor • LinuxContainerExecutor
  • 35. 35 Client Failover Client Failover Availability ©2014 Cloudera, Inc. All rights reserved. RM Ele ctor RM Ele ctor ZK Store NM NM NM NM Client Client Client
  • 36. 36 Availability – SubtleAes • Embedded leader elector • No need for a separate daemon like ZKFC • Implicit fencing using ZKRMStateStore • AcAve RM claims exclusive access to store through ACL magic ©2014 Cloudera, Inc. All rights reserved.
  • 37. 37 Availability – ImplicaAons • Previously submi[ed applicaAons conAnue to run • New ApplicaAon Masters are created • If the AM checkpoints state, can conAnue from where it leT • MR keeps track of completed tasks. They don’t have to be re-­‐run ©2014 Cloudera, Inc. All rights reserved. • Future • Work-­‐preserving RM Restart / Failover
  • 38. 38 Availability – ImplicaAons ©2014 Cloudera, Inc. All rights reserved. • Transparent to clients • RM unavailable for a small duraAon • AutomaAcally failover to the AcAve RM • Web UI redirects • REST API redirects (starAng 5.1.0)
  • 40. 40 Resource Model and CapaciAes ©2014 Cloudera, Inc. All rights reserved. • Resource vectors • e.g. 1024 MB, 2 vcores, … • No more task slots! • Nodes specify the amount of resources they have • yarn.nodemanager.resource.memory-­‐mb • yarn.nodemanager.resource.cpu-­‐vcores • vcores to cores relaAon, not really “virtual”
  • 41. 41 Resources and Scheduling • What you request is what you get • No more fixed-­‐size slots • Framework/applicaAon requests resources for a task • MR AM requests resources for map and reduce tasks, these requests can potenAally be for different amounts of resources ©2014 Cloudera, Inc. All rights reserved.
  • 42. 42 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 Node 1 Node 2 Node 3
  • 43. 43 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 I want 2 containers with 1024 MB and a 1 core each Node 1 Node 2 Node 3
  • 44. 44 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 Noted Node 1 Node 2 Node 3
  • 45. 45 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 I’m sAll here Node 1 Node 2 Node 3
  • 46. 46 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 I’ll reserve some space on node1 for AM1 Node 1 Node 2 Node 3
  • 47. 47 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 Got anything for me? Node 1 Node 2 Node 3
  • 48. 48 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 Here’s a security token to let you launch a container on Node 1 Node 1 Node 2 Node 3
  • 49. 49 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 Hey, launch my container with this shell command Node 1 Node 2 Node 3
  • 50. 50 YARN Scheduling ResourceManager ApplicaAon Master 1 ApplicaAon Master 2 Node 1 Node 2 Node 3 Container
  • 51. 51 Resources on a Node 5 GB Map 512 MB Reduce 1536 MB Map 1024 MB Map 256 MB Map 256 MB Reduce 512 MB MR -­‐ AM 1024 MB
  • 52. 52 FairScheduler (FS) • When space becomes available to run a task on the cluster, which applicaAon do we give it to? • Find the job that is using the least space.
  • 53. 53 FS: Apps and Queues • Apps go in “queues” • Share fairly between queues • Share fairly between apps within queues
  • 54. 54 FS: Hierarchical Queues Root Mem Capacity: 12 GB CPU Capacity: 24 cores MarkeJng Fair Share Mem: 4 GB Fair Share CPU: 8 cores RD Fair Share Mem: 4 GB Fair Share CPU: 8 cores Sales Fair Share Mem: 4 GB Fair Share CPU: 8 cores Jim’s Team Fair Share Mem: 2 GB Fair Share CPU: 4 cores Bob’s Team Fair Share Mem: 2 GB Fair Share CPU: 4 cores
  • 55. 55 FS: Fast and Slow Lanes Root Mem Capacity: 12 GB CPU Capacity: 24 cores MarkeJng Fair Share Mem: 4 GB Fair Share CPU: 8 cores Sales Fair Share Mem: 4 GB Fair Share CPU: 8 cores Fast Lane Max Share Mem: 1 GB Max Share CPU: 1 cores Slow Lane Fair Share Mem: 3 GB Fair Share CPU: 7 cores
  • 56. 56 FS: Fairness for Hierarchies • Traverse the tree starAng at the root queue • Offer resources to subqueues in order of how few resources they’re using
  • 57. 57 FS: Hierarchical Queues Root MarkeJng RD Sales Jim’s Team Bob’s Team
  • 58. 58 FS: MulA-­‐resource scheduling • Scheduling based on mulAple resources • CPU, memory • Future: Disk, Network • Why mulAple resources? • Be[er uAlizaAon • More fair
  • 59. 59 FS: More features ©2014 Cloudera, Inc. All rights reserved. • PreempAon • To avoid starvaAon, preempt tasks using more than their fairshare aTer the preempAon Ameout • Warn applicaAons. ApplicaAon can choose to kill any of its containers • Locality through delay scheduling • Try to give node-­‐local, rack-­‐local resources by waiAng for someAme
  • 60. 60 Enforcing resource limits ©2014 Cloudera, Inc. All rights reserved. • Memory • Monitor process usage and kill if crosses • Disable virtual memory checking • Physical memory checking is being improved • CPU • Cgroups
  • 61. 661 1 MicrosoT Office EULA. Really.
  • 62. 62 MR1 to MR2 Gotchas • AMs can take up all resources • Symptom: Submi[ed jobs don’t run • Fix in progress -­‐ to limit number of max applicaAons • Work around – scheduler allocaAons to limit number of applicaAons • How to run 4 maps and 2 reduces per node? • Don’t try to tune number of tasks per node • Set assignMulAple to false to spread allocaAons ©2014 Cloudera, Inc. All rights reserved.
  • 63. 63 MR1 to MR2 Gotchas • Comparing MR1 and MR2 benchmarks • TestDFSIO runs best on dedicated CPU/disk, harder to pin • TeraSort changed: less compressible == more network xfer • Resource AllocaAon vs Resource ConsumpAon • RM allocates resources, heap specified elsewhere • JVM overhead not included • Mind your mapred.[map|reduce].child.java.opts ©2014 Cloudera, Inc. All rights reserved.
  • 64. 64 MR1 to MR2 Gotchas • Changes in logs, tracing problems harder • MR1: distributed grep on JobId • YARN logs more generic, deal with containers not apps ©2014 Cloudera, Inc. All rights reserved.
  • 66. 66 Job History • Job History Viewing was moved to its own server: Job History Server • Helps with load on RM (JT equivalent) • Helps separate MR from YARN
  • 67. 67 How History Flows? • AM • While running, keeps track of all events during execuAon • On success, before finishing up • Writes the history informaAon to done_intermediate_dir • The JHS • periodically scans the done_intermediate dir • moves the files to done_dir • starts showing the history
  • 68. 68 History: Important ConfiguraAon ProperAes • yarn.app.mapreduce.am.staging-dir • Default (CM): /user ← Want this also for security • Default (CDH): /tmp/hadoop-­‐yarn/staging • Staging directory for MapReduce applicaAons • mapreduce.jobhistory.done-dir • Default: ${yarn.app.mapreduce.am.staging-­‐dir}/history/done • Final locaAon in HDFS for history files • mapreduce.jobhistory.intermediate-done-dir • Default: ${yarn.app.mapreduce.am.staging-­‐dir}/history/done_intermediate • LocaAon in HDFS where AMs dump history files
  • 69. 69 History: Important ConfiguraAon ProperAes • mapreduce.jobhistory.max-age-ms • Default 604800000 (7 days) • Max age before JHS deletes history • mapreduce.jobhistory.move.interval-ms • Default: 180000 (3 min) • Frequency at which JHS scans the intermediate_done dir
  • 70. 70 History: Miscellaneous • The JHS runs as ‘mapred’, the AM run as the user who submi[ed the job, and the RM runs as ‘yarn’ • The done-­‐intermediate dir needs to be writable by the user who submi[ed the job and readable by ‘mapred’ • The RM, AM, and JHS should have idenAcal versions of the jobhistory-­‐related properAes so they all “agree”
  • 71. 71 ApplicaAon History Server / Timeline Server • Work in progress to capture history and other informaAon for non-­‐MR YARN applicaAons ©2014 Cloudera, Inc. All rights reserved.
  • 72. 72 YARN Container Logs • While applicaAon is running • Local to the NM. yarn.nodemanager.log-­‐dirs • ATer applicaAon finishes • Logs aggregated to HDFS • yarn.nodemanager.remote-­‐app-­‐log-­‐dir ©2014 Cloudera, Inc. All rights reserved. • Disable aggregaAon? • yarn.log-­‐aggregaAon-­‐enable
  • 75. 75 InteracAng with a YARN cluster ©2014 Cloudera, Inc. All rights reserved. • Java API • MR1 – MR2 APIs are compaAble • REST API • RM, NM, JHS – all have REST APIs that are very useful • Llama (Long-­‐Lived ApplicaAon Master) • Cloudera Impala can reserve, use, and release resource allocaAons without using YARN-­‐managed container processes • CLI • yarn rmadmin, applicaAon, etc. • Web UI • New and “improved” – need Ame to get used to
  • 77. 77 ©2014 Cloudera, Inc. All rights reserved. YARN ApplicaAons • MR2 • Cloudera Impala • Apache Spark • Others? Custom? • Apache Slider (incubaAng); not producAon-­‐ready • Accumulo • HBase • Storm
  • 79. 79 The Cloudera View of YARN ©2014 Cloudera, Inc. All rights reserved. • Shipping • Enabled by default on CDH5+ • Included for past two years, not enabled • Supported • Recommended
  • 80. 80 • Benchmarking is harder • different uAlizaAon paradigm • “whole cluster” benchmarks more important, e.g. SWIM • Tuning sAll largely trial/error • MR1 was the same originally • YARN/MR2 will get there eventually ©2014 Cloudera, Inc. All rights reserved. Growing Pains
  • 81. 81 What Are Customers Doing? • A few are using in producAon • Many are exploring ©2014 Cloudera, Inc. All rights reserved. • Spark • Impala via Llama • Most are waiAng
  • 82. 82 ©2014 Cloudera, Inc. All rights reserved. Why not Mesos? • Mesos • designed to be completely general purpose • more burden on app developer (offer model vs app request) • YARN • designed with Hadoop in mind • supports Kerberos • more robust/familiar scheduling • rack/machine locality, out of box • Supportability • all commercial Hadoop vendors support YARN • support for Mesos limited to startup Mesosphere
  • 83. 83 Is This the End for MapReduce? ©2014 Cloudera, Inc. All rights reserved.
  • 84. Extra special thanks: ALL OF YOU
  • 85. 85 • CC BY 2.0 flik h[ps://flic.kr/p/4RVoUX • CC BY 2.0 Ian Sane h[ps://flic.kr/p/nRyHxd • CC BY-­‐NC 2.0 lollyknit h[ps://flic.kr/p/49C1Xi • CC BY-­‐ND 2.0 jankunst h[ps://flic.kr/p/deU71s • CC BY-­‐SA 2.0 pierrepocs h[ps://flic.kr/p/9mgdMd • CC BY-­‐SA 2.0 bekathwia h[ps://flic.kr/p/4FpABU • CC BY-­‐NC-­‐ND 2.0 digitalnc h[ps://flic.kr/p/dxyTt1 • CC BY-­‐NC-­‐ND 2.0 arselectronica h[ps://flic.kr/p/7yw8z2 • CC BY-­‐NC-­‐ND 2.0 yum9me h[ps://flic.kr/p/81hQ49 • CC BY-­‐NC-­‐SA 2.0 jimnix h[ps://flic.kr/p/gsqpWC • MicrosoT Office EULA (really) ©2014 Cloudera, Inc. All rights reserved. Image Credits
  • 86. 86 Thank You! Alex Moundalexis @technmsg Insert wi[y tagline here.