2. Who am I
> Masahiro Nakagawa
> github: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> Living at OSS :)
> D language - Phobos, a.k.a standard library, committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of several meetups (Presto, DTM, etc…)
> etc…
4. What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Simple core + plugins written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> List of users
> http://www.fluentd.org/testimonials
9. Core Plugins
> Divide & Conquer
> Buffering & Retrying
> Error handling
> Message routing
> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data
10. Core Plugins
> Divide & Conquer
> Buffering & Retrying
> Error handling
> Message routing
> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data
Common
Concerns
Use Case
Specific
11. > default second unit
> from data source
Event structure(log message)
✓ Time
> for message routing
> where is from?
✓ Tag
> JSON format
> MessagePack
internally
> schema-free
✓ Record
12. Reliable streaming data transfer
error retry
error retry retry
retry
Batch
Stream
Other stream
(micro batch)
16. # logs from a file
<source>
type tail
path /var/log/httpd.log
pos_file /tmp/pos_file
format apache2
tag backend.apache
</source>
!
# logs from client libraries
<source>
type forward
port 24224
</source>
!
# store logs to MongoDB
<match backend.*>
type mongo
database fluent
collection test
</match>
19. # logs from a file
<source>
type tail
path /var/log/httpd.log
pos_file /tmp/pos_file
format apache2
tag web.access
</source>
!
# logs from client libraries
<source>
type forward
port 24224
</source>
!
# store logs to ES and HDFS
<match web.*>
type copy
<store>
type elasticsearch
logstash_format true
</store>
<store>
type webhdfs
host namenode
port 50070
path /path/on/hdfs/
</store>
</match>
20. CEP for Stream Processing
Norikra is a SQL based CEP engine: http://norikra.github.io/
37. fluent-bit
> Made for Embedded Linux
> OpenEmbedded & Yocto Project
> Intel Edison, RasPi & Beagle Black boards
> https://github.com/fluent/fluent-bit
> Standalone application or Library mode
> Built-in plugins
> input: cpu, kmsg, output: fluentd
> First release at the end of Mar 2015
39. Treasure Agent (td-agent)
> Treasure Data distribution of Fluentd
> including Ruby and QA’ed plugins
> Treasure Agent 2 is current stable
> We recommend to use v2, not v1
> including fluentd-ui
> Next release, 2.2.0, uses fluentd v0.12
40. Embulk
> Bulk Loader version of Fluentd
> Pluggable architecture
> JRuby, JVM languages
> High performance parallel processing
> Share your script as a plugin
> https://github.com/embulk
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed