Storm vs Hadoop comparison and introduction to concepts

STORM
COMPARISON – INTRODUCTION - CONCEPTS

PRESENTATION BY KASPER MADSEN
MARCH - 2012

HADOOP VS STORM
Batch processing Real-time processing
Jobs runs to completion Topologies run forever
JobTracker is SPOF* No single point of failure
Stateful nodes Stateless nodes

Scalable Scalable
Guarantees no data loss Guarantees no data loss
Open source Open source

* Hadoop 0.21 added some checkpointing
SPOF: Single Point Of Failure

COMPONENTS
Nimbus daemon is comparable to Hadoop JobTracker. It is the master
Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker
Worker is spawned by supervisor, one per port defined in storm.yaml configuration
Task is run as a thread in workers
Zookeeper* is a distributed system, used to store metadata. Nimbus and
Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper.

Notice all communication between Nimbus and
Supervisors are done through Zookeeper

On a cluster with 2k+1 zookeeper nodes, the system
can recover when maximally k nodes fails.

* Zookeeper is an Apache top-level project

STREAMS
Stream is an unbounded sequence of tuples.
Topology is a graph where each node is a spout or bolt, and the edges indicate
which bolts are subscribing to which streams.
• A spout is a source of a stream
• A bolt is consuming a stream (possibly emits a new one)
Subscribes: A
• An edge represents a grouping Emits: C

Subscribes: C & D

Subscribes: A
Source of stream A Emits: D

Source of stream B
Subscribes:A & B

GROUPINGS
Each spout or bolt are running X instances in parallel (called tasks).
Groupings are used to decide which task in the subscribing bolt, the tuple is sent to
Shuffle grouping is a random grouping
Fields grouping is grouped by value, such that equal value results in equal task
All grouping replicates to all tasks
Global grouping makes all tuples go to one task
None grouping makes bolt run in same thread as bolt/spout it subscribes to
Direct grouping producer (task that emits) controls which consumer will receive
4 tasks 3 tasks

2 tasks

2 tasks

TestWordSpout ExclamationBolt ExclamationBolt

EXAMPLE
TopologyBuilder builder = new TopologyBuilder(); Create stream called ”words”

Run 10 tasks
builder.setSpout("words", new TestWordSpout(), 10);
Create stream called ”exclaim1”
builder.setBolt("exclaim1", new ExclamationBolt(), 3) Run 3 tasks

Subscribe to stream ”words”,
.shuffleGrouping("words"); using shufflegrouping
Create stream called ”exclaim2”
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
Run 2 tasks
.shuffleGrouping("exclaim1"); Subscribe to stream ”exclaim1”,
using shufflegrouping

A bolt can subscribe to an unlimited number of
streams, by chaining groupings.

The sourcecode for this example is part of the storm-starter project on github


EXAMPLE – 1
TestWordSpout
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
}

The TestWordSpout emits a random string from the
array words, each 100 milliseconds


EXAMPLE – 2
ExclamationBolt Prepare is called when bolt is created

OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
} Execute is called for each tuple
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
} declareOutputFields is called when bolt is created
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}

declareOutputFields is used to declare streams and their schemas. It
is possible to declare several streams and specify the stream to use
when outputting tuples in the emit function call.

FAULT TOLERANCE
Zookeeper stores metadata in a very robust way
Nimbus and Supervisor are stateless and only need metadata from ZK to work/restart
When a node dies
• The tasks will time out and be reassigned to other workers by Nimbus.
When a worker dies
•The supervisor will restart the worker.
•Nimbus will reassign worker to another supervisor, if no heartbeats are sent.
•If not possible (no free ports), then tasks will be run on other workers in
topology. If more capacity is added to the cluster later, STORM will
automatically initialize a new worker and spread out the tasks.
When nimbus or supervisor dies
• Workers will continue to run
• Workers cannot be reassigned without Nimbus
• Nimbus and Supervisor should be run using a process monitoring tool, to
restarts them automatically if they fail.

AT-LEAST-ONCE PROCESSING
STORM guarantees at-least-once processing of tuples.
Message id, gets assigned to a tuple when emitting from spout or bolt. Is 64 bits long
Tree of tuples is the tuples generated (directly and indirectly) from a spout tuple.
Ack is called on spout, when tree of tuples for spout tuple is fully processed.
Fail is called on spout, if one of the tuples in the tree of tuples fails or the tree of
tuples is not fully processed within a specified timeout (default is 30 seconds).
It is possible to specify the message id, when emitting a tuple. This might be useful for
replaying tuples from a queue.

Ack/fail method called when tree of
tuples have been fully processed or
failed / timed-out

AT-LEAST-ONCE PROCESSING – 2
Anchoring is used to copy the spout tuple message id(s) to the new tuples
generated. In this way, every tuple knows the message id(s) of all spout tuples.
Multi-anchoring is when multiple tuples are anchored. If the tuple tree fails, then
multiple spout tuples will be replayed. Useful for doing streaming joins and more.
Ack called from a bolt, indicates the tuple has been processed as intented
Fail called from a bolt, replays the spout tuple(s)
Every tuple must be acked/failed or the task will run out of memory at some point.

_collector.emit(tuple, new Values(word));  Uses anchoring

_collector.emit(new Values(word));  Does NOT use anchoring

Acker tasks tracks the tree of tuples for every spout tuple
• The acker task responsible for a given spout tuple is determined by modulo
on message id. Since all tuples have all spout tuple message ids, it is easy
to call the correct acker tasks.
• Acker task stores a map, the format is {spoutMsgId, {spoutTaskId, ”ack val”}}
• ”ack val” is the representation of state of entire tree of tuples. It is the xor of
all tuple message ids created and acked in the tree of tuples.
• When ”ack val” is 0, then tuple tree is fully processed.
• Since message ids are random 64 bits numbers, chances of ”ack val”
becoming 0 by accident is extremely small.

Important to set number of acker tasks in topology when
processing large amounts of tuples (defaults to 1)

Example Bolt
Emit ”h” Task: 3
spoutIds: 10
msgId: 2
Spout Emit ”hey” Bolt
Task: 1 msgId:10 Task: 2
Emit ”ey”
spoutIds: 10
msgId: 3 Bolt
Task: 4

Shows what happens in acker task, for one spout tuple. Format is: {spoutMsgId, {spoutTaskId, ”ack val”}}
1. After emit ”hey”: {10, {1, 0000 XOR 1010 = 1010}
2. After emit ”h”: {10, {1, 1010 XOR 0010 = 1000}
3. After emit ”ey”: {10, {1, 1000 XOR 0011 = 1011} USES 64 BIT IDS
4. After ack ”hey”: {10, {1, 1011 XOR 1010 = 0001} IN REALITY
5. After ack ”h”: {10, {1, 0001 XOR 0010 = 0011}
6. After ack ”ey”: {10, {1, 0011 XOR 0011 = 0000}
7. Since ”ack val” is 0, spout tuple with id 10, must be fully processed. Call ack on spout (task 1)

A tuple isn't acked because the task died:
The spout tuple(s) at the root of the tree of tuples will time out and be replayed.
Acker task dies:
All the spout tuples the acker was tracking will time out and be replayed.
Spout task dies:
In this case the source that the spout talks to is responsible for replaying the
messages. For example, queues like Kestrel and RabbitMQ will place all pending
messages back on the queue when a client disconnects.

At-least-once processing might process a tuple more than once.
Example

All grouping Bolt 1. A spout tuple is emitted to task 2 and 3
Task: 2 2. Worker responsible for task 3 fails
3. Supervisor restarts worker
Spout
Task: 1
4. Spout tuple is replayed and emitted to task 2 and 3
5. Task 2 will now have executed the same bolt twice
Bolt
Task: 3

Consider why the all grouping is not important in this example

EXACTLY-ONCE-PROCESSING
Transactional topologies (TT) is an abstraction built on STORM primitives.
TT guarantees exactly-once-processing of tuples.
Acking is optimized in TT, no need to do anchoring or acking manually.
Bolts execute as new instances per attempt of processing a batch

Example

All grouping Bolt 1. A spout tuple is emitted to task 2 and 3
Task: 2 2. Worker responsible for task 3 fails
3. Supervisor restarts worker
Spout
Task: 1
4. Spout tuple is replayed and emitted to task 2 and 3
5. Task 2 and 3 initiate new bolts because of new attempt
Bolt 5. Now there is no problem
Task: 3

EXACTLY-ONCE-PROCESSING – 2
For efficiency batch processing of tuples is introduced in TT
Batch has two states: processing or committing
Many batches can be in the processing state concurrently
Only one batch can be in the committing state, and a strong ordering is imposed. That
means batch 1 will always be committed before batch 2 and so on.
Types of bolts for TT: BasicBolt, BatchBolt, BatchBolt marked as committer
BasicBolt is processing one tuple at a time.
BatchBolt is processing batches. Call finishBatch when all tuples of batch is executed
BatchBolt marked as committer is calling finishBatch only when batch is in
committing state.

Transactional spout has capability Committer Committer
to replay exact batches of tuples batchbolt batchbolt batchbolt
batchbolt

BATCH IS IN PROCESSING STATE
Bolt A: execute method is called for all tuples received from spout
finishBatch is called when first batch is received
Bolt B: execute method is called for all tuples received from bolt A
finishBatch is NOT called because batch is in processing state
Bolt C: execute method is called for all tuples received from bolt A (and B)
finishBatch is NOT called, because bolt B has not called finishBatch
Bolt D: execute method is called for all tuples received from bolt C
finishBatch is NOT called because batch is in processing state
BATCH CHANGES TO COMMITTING STATE
Bolt B: finishBatch is called
Bolt C: finishBatch is called, because we know we got all tuples from Bolt B now
Bolt D: finishBatch is called, because we know we got all tuples from Bolt C now

Transactional spout
All groupings on When batch should enter processing state:
batch stream • Coordinator emits a tuple with TransactionAttempt and the metadata for that
transaction to the "batch" stream.
• All emitter tasks receives the tuple and begins to emit their portion of tuples for
the given batch.

When processing phase of batch is done (determined by acker task):
• Ack gets called on coordinator

When ack gets called on coordinator and all prior transactions have committed:
Regular bolt, • Coordinator emits a tuple with TransactionAttempt to the commit stream.
Parallelism of P • All Bolts which are marked as committers subscribe to the commit stream of the
coordinator using an all grouping.
• Bolts marked as committers now know the batch is in the committing phase
Regular spout, parallelism of 1
Defined streams: batch & commit
When batch is fully processed again (determined by acker task):
• Ack gets called on coordinator
• Coordinator knows batch is now committed

STORM LIBRARIES
STORM uses a lot of libraries. The most prominent are
Clojure a new lisp programming language. Crash-course follows
Jetty an embedded webserver. Used to host the UI of Nimbus.
Kryo a fast serializer, used when sending tuples
Thrift a framework to build services. Nimbus is a thrift daemon
ZeroMQ a very fast transportation layer
Zookeeper a distributed system for storing metadata

LEARN MORE
Wiki (https://github.com/nathanmarz/storm/wiki)
Storm-starter (https://github.com/nathanmarz/storm-starter)
Mailing list (http://groups.google.com/group/storm-user)
#storm-user room on freenode

from: http://www.cupofjoe.tv/2010/11/learn-lesson.html

Storm vs Hadoop comparison and introduction to concepts

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie Storm vs Hadoop comparison and introduction to concepts

Ähnlich wie Storm vs Hadoop comparison and introduction to concepts (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Storm vs Hadoop comparison and introduction to concepts