Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
×
1 of 18

Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, Anton Gorshkov

4

Share

Download to read offline

Building a real-time pipeline from scratch that is able to handle billion+ transactions per day, store, analyze and visualize it all in real-time has never been easier. In this build-as-we-go talk, we’ll create a front-to-back architecture that does exactly that.

* we’ll start with a simple producer emitting a few messages and publishing them onto a Kafka queue
* on consuming end of the queue a Spark-based Streamliner process will pick them up and store in MemSQL
* ZoomData will connect to MemSQL for real-time visualization where we’ll be able to ask various questions and see answers change as data is flowing through the system
* we’ll quickly make the entire pipeline more complex by increasing the amount of data as well as complexity of the data, until reaching 100K transactions per second

As we walk through this demo, we will touch on cross data-center Kafka and MemSQL set-ups, speed limitations if any as well as echo back to real-life use cases of a similar set-up used in Goldman’s Asset Management division for the purposes of Portfolio Management & Trading.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, Anton Gorshkov

  1. 1. Anton Gorshkov Real-Time Analytics Visualized Kafka  Streamliner  MemSQL  ZoomData Please note that during the course of this presentation ZoomData products will be used and shown on the screen. Goldman Sachs has an ownership interest in ZoomData, Inc. and may have other business relationships with ZoomData, Inc. Nothing herein shall constitute an offer to sell or a solicitation of an offer to buy an interest in any entity or product. Learn more at GS.com/Engineering
  2. 2. >docker run kafka >docker run memsql >docker run zoomdata Initial Set-Up 2 4-CPU / 16GB / 80GB SSD / Intel Xeon E5-2670 @ 2.5GHzX
  3. 3. Context
  4. 4. Start with one producer & a queue Producer 1 Kafka
  5. 5. Add-a-Sink Producer 1 Kafka In-Memory SQL RDB Consumer
  6. 6. Add a Real-Time Visualization Producer 1 Kafka In-Memory SQL RDB RT VisConsumer
  7. 7. Enterprise Grade?
  8. 8. Resilience - Don’t lose data - Be Up - Deliver (at least once and in-order)
  9. 9. Bad Data
  10. 10. Schema Evolution
  11. 11. How “I” is the BI? - Is it a “view” or a “do work” layer? - Data-at-Rest vs Data-at-Motion - Pull vs Push - Consistency to other vis layers
  12. 12. Will It Scale? Throughput, Concurrency, Size (and at what cost…)
  13. 13. Will it Scale? Producer 1 Kafka In-Memory SQL RDB RT VisProducer 2 Producer n Consumer
  14. 14. Audience Participation Time… Kafka In-Memory SQL RDB RT Vis (620) 487-2222 Consumer
  15. 15. Audience Participation Time… send a text [Fruit] [Quantity] (620) 487-2222 example: mango 540
  16. 16. Adaptability Producer 1 Kafka In-Memory SQL RDB RT Vis Producer 2 Producer n Elastic Kibana Consumer Kafka Connect
  17. 17. Representative Deployment Direct Sink Spark Streaming IMRDBM S Data Lake Kafka Source DBs Order Mgmt Batch ETL Cache VertX
  18. 18. Learn more at GS.com/Engineering

Editor's Notes

  • Our online Engineering Hub (gs.com/engineering) is regularly updated with profiles of our latest projects, our engineers and our activities in the community.

    Take a moment to visit the hub yourself and find at least one article you can reference or talk to….
  • ×