5. Treasure Data Service
> A simplified cloud analytics infrastructure
> Customers focus on their business
> SQL interfaces for Schema-less data sources
> Fit for Data Hub / Lake
> Batch / Low latency / Machine Learning
> Lots of ingestion and integrated solutions
> Fluentd / Embulk / Data Connector / SDKs
> Result Output / Prestogres Gateway / BI tools
> Awesome support for time to value
8. Plazma by the numbers
> Streaming import
> 45 billion records / day
> Bulk Import
> 10 billion records / day
> Hive Query
> 3+ trillion records / day
> Machine Learning queries, Hivemall, increased
> Presto Query
> 3+ trillion records / day
9. TD’s resource management
> Guarantee and boost compute resources
> Guarantee for stabilizing query performance
> Boost for sharing free resources
> Get multi-tenant merit
> Global resource schedular
> manage job, resource and priority across users
> Separate storage from compute resource
> Easy to scale workers
> We can use S3 / GCS / Azure Storage for reliable backend
11. Import
Queue
td-agent
/ fluentd
Import
Worker
✓ Buffering for
5 minute
✓ Retrying
(at-least once)
✓ On-disk buffering
on failure
✓ Unique ID for
each chunk
API
Server
It’s like JSON.
but fast and small.
unique_id=375828ce5510cadb
{“time”:1426047906,”uid”:1,…}
{“time”:1426047912,”uid”:9,…}
{“time”:1426047939,”uid”:3,…}
{“time”:1426047951,”uid”:2,…}
…
MySQL
(PerfectQueue)
12. Import
Queue
td-agent
/ fluentd
Import
Worker
✓ Buffering for
1 minute
✓ Retrying
(at-least once)
✓ On-disk buffering
on failure
✓ Unique ID for
each chunk
API
Server
It’s like JSON.
but fast and small.
MySQL
(PerfectQueue)
unique_id time
375828ce5510cadb 2015-12-01 10:47
2024cffb9510cadc 2015-12-01 11:09
1b8d6a600510cadd 2015-12-01 11:21
1f06c0aa510caddb 2015-12-01 11:38
13. Import
Queue
td-agent
/ fluentd
Import
Worker
✓ Buffering for
5 minute
✓ Retrying
(at-least once)
✓ On-disk buffering
on failure
✓ Unique ID for
each chunk
API
Server
It’s like JSON.
but fast and small.
MySQL
(PerfectQueue)
unique_id time
375828ce5510cadb 2015-12-01 10:47
2024cffb9510cadc 2015-12-01 11:09
1b8d6a600510cadd 2015-12-01 11:21
1f06c0aa510caddb 2015-12-01 11:38UNIQUE
(at-most once)
16. Realtime
Storage
PostgreSQL
Amazon S3 /
Basho Riak CS
Metadata
Import
Queue
Import
Worker
Import
Worker
Import
Worker
uploaded time file index range records
2015-03-08 10:47
[2015-12-01 10:47:11,
2015-12-01 10:48:13]
3
2015-03-08 11:09
[2015-12-01 11:09:32,
2015-12-01 11:10:35]
25
2015-03-08 11:38
[2015-12-01 11:38:43,
2015-12-01 11:40:49]
14
… … … …
Archive
Storage
Metadata of the
records in a file
(stored on
PostgreSQL)
17. Amazon S3 /
Basho Riak CS
Metadata
Merge Worker
(MapReduce)
uploaded time file index range records
2015-03-08 10:47
[2015-12-01 10:47:11,
2015-12-01 10:48:13]
3
2015-03-08 11:09
[2015-12-01 11:09:32,
2015-12-01 11:10:35]
25
2015-03-08 11:38
[2015-12-01 11:38:43,
2015-12-01 11:40:49]
14
… … … …
file index range records
[2015-12-01 10:00:00,
2015-12-01 11:00:00]
3,312
[2015-12-01 11:00:00,
2015-12-01 12:00:00]
2,143
… … …
Realtime
Storage
Archive
Storage
PostgreSQL
Merge every 1 hourRetrying + Unique
(at-least-once + at-most-once)
18. Amazon S3 /
Basho Riak CS
Metadata
uploaded time file index range records
2015-03-08 10:47
[2015-12-01 10:47:11,
2015-12-01 10:48:13]
3
2015-03-08 11:09
[2015-12-01 11:09:32,
2015-12-01 11:10:35]
25
2015-03-08 11:38
[2015-12-01 11:38:43,
2015-12-01 11:40:49]
14
… … … …
file index range records
[2015-12-01 10:00:00,
2015-12-01 11:00:00]
3,312
[2015-12-01 11:00:00,
2015-12-01 12:00:00]
2,143
… … …
Realtime
Storage
Archive
Storage
PostgreSQL
GiST (R-tree) Index
on“time” column on the files
Read from Archive Storage if merged.
Otherwise, from Realtime Storage
19. Data Importing
> Scalable & Reliable importing
> Fluentd buffers data on a disk
> Import queue deduplicates uploaded chunks
> Workers take the chunks and put to Realtime Storage
> Instant visibility
> Imported data is immediately visible by query engines.
> Background workers merges the files every 1 hour.
> Metadata
> Index is built on PostgreSQL using RANGE type and
GiST index
21. time code method
2015-12-01 10:02:36 200 GET
2015-12-01 10:22:09 404 GET
2015-12-01 10:36:45 200 GET
2015-12-01 10:49:21 200 POST
… … …
time code method
2015-12-01 11:10:09 200 GET
2015-12-01 11:21:45 200 GET
2015-12-01 11:38:59 200 GET
2015-12-01 11:43:37 200 GET
2015-12-01 11:54:52 “200” GET
… … …
Archive
Storage
Files on Amazon S3 / Basho Riak CS
Metadata on PostgreSQL
path index range records
[2015-12-01 10:00:00,
2015-12-01 11:00:00]
3,312
[2015-12-01 11:00:00,
2015-12-01 12:00:00]
2,143
… … …
MessagePack Columnar
File Format
22. time code method
2015-12-01 10:02:36 200 GET
2015-12-01 10:22:09 404 GET
2015-12-01 10:36:45 200 GET
2015-12-01 10:49:21 200 POST
… … …
time code method
2015-12-01 11:10:09 200 GET
2015-12-01 11:21:45 200 GET
2015-12-01 11:38:59 200 GET
2015-12-01 11:43:37 200 GET
2015-12-01 11:54:52 “200” GET
… … …
Archive
Storage
path index range records
[2015-12-01 10:00:00,
2015-12-01 11:00:00]
3,312
[2015-12-01 11:00:00,
2015-12-01 12:00:00]
2,143
… … …
column-based partitioning
time-based partitioning
Files on Amazon S3 / Basho Riak CS
Metadata on PostgreSQL
23. time code method
2015-12-01 10:02:36 200 GET
2015-12-01 10:22:09 404 GET
2015-12-01 10:36:45 200 GET
2015-12-01 10:49:21 200 POST
… … …
time code method
2015-12-01 11:10:09 200 GET
2015-12-01 11:21:45 200 GET
2015-12-01 11:38:59 200 GET
2015-12-01 11:43:37 200 GET
2015-12-01 11:54:52 “200” GET
… … …
Archive
Storage
path index range records
[2015-12-01 10:00:00,
2015-12-01 11:00:00]
3,312
[2015-12-01 11:00:00,
2015-12-01 12:00:00]
2,143
… … …
column-based partitioning
time-based partitioning
Files on Amazon S3 / Basho Riak CS
Metadata on PostgreSQL
SELECT code, COUNT(1) FROM logs
WHERE time >= 2015-12-01 11:00:00
GROUP BY code
24. Handling Eventual Consistency
1. Writing data / metadata first
> At this time, data is not visible
2. Check data is available or not
> GET, GET, GET…
3. Data become visible
> Query includes imported data!
Ex. Netflix case
> https://github.com/Netflix/s3mper
25. Hide network cost
> Open a lot of connections to Object Storage
> Using range feature with columnar offset
> Improve scan performance for partitioned data
> Detect recoverable error
> We have error lists for fault tolerance
> Stall checker
> Watch the progress of reading data
> If processing time reached threshold, re-connect to OS
and re-read data
27. Recoverable errors
> Error types
> User error
> Syntax error, Semantic error
> Insufficient resource
> Exceeded task memory size
> Internal failure
> I/O error of S3 / Riak CS
> worker failure
> etc
We can retry these patterns
28. Recoverable errors
> Error types
> User error
> Syntax error, Semantic error
> Insufficient resource
> Exceeded task memory size
> Internal failure
> I/O error of S3 / Riak CS
> worker failure
> etc
We can retry these patterns
29. Presto retry on Internal Errors
> Query succeed eventually
log scale
30. time code method
2015-12-01 10:02:36 200 GET
2015-12-01 10:22:09 404 GET
2015-12-01 10:36:45 200 GET
2015-12-01 10:49:21 200 POST
… … …
user time code method
391 2015-12-01 11:10:09 200 GET
482 2015-12-01 11:21:45 200 GET
573 2015-12-01 11:38:59 200 GET
664 2015-12-01 11:43:37 200 GET
755 2015-12-01 11:54:52 “200” GET
… … …
31. time code method
2015-12-01 10:02:36 200 GET
2015-12-01 10:22:09 404 GET
2015-12-01 10:36:45 200 GET
2015-12-01 10:49:21 200 POST
… … …
user time code method
391 2015-12-01 11:10:09 200 GET
482 2015-12-01 11:21:45 200 GET
573 2015-12-01 11:38:59 200 GET
664 2015-12-01 11:43:37 200 GET
755 2015-12-01 11:54:52 “200” GET
… … …
MessagePack Columnar
File Format is schema-less
✓ Instant schema change
SQL is schema-full
✓ SQL doesn’t work
without schema
Schema-on-Read
37. Hadoop
> Distributed computing framework
> Consist of many components…
http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
38. Presto
>
> Open sourced by Facebook
> https://github.com/facebook/presto
A distributed SQL query engine
for interactive data analisys
against GBs to PBs of data.
39. Conclusion
> Build scalable data analytics platform on Cloud
> Separate resource and storage
> loosely-coupled components
> We have lots of useful OSS and services :)
> There are many trade-off
> Use existing component or create new component?
> Stick to the basics!
> If you tired, please use Treasure Data ;)