How jKool Analyzes Streaming Data in Real Time with DataStax

How jKool Analyzes Streaming Data in Real
Time with DataStax
Charles Rich
VP of Product Management
jKool – jKoolcloud.com
Thank you for joining. We will begin shortly.

All attendees
placed on mute
Input questions at any time
using the online interface
Webinar Housekeeping

© 2015 jKool, All Rights Reserved. 3
Agenda
• jKool Overview
• jKool Technology
• Challenges
• Why We Selected Cassandra and DataStax
• Demo

jKool Overview
• jKool
– Founded 2014 as an spin-off from Nastel Technologies
– Expertize in building scalable real-time analytics
• Initial Vision
– Address the big data problems we saw at customers
• Inability to analyze data fast enough to take action and address problems
• Too much data – Too little time
– Provide real-time, in-memory analytics (our heritage)
– Leverage open-source
– SaaS (or on-premises)
– Simplicity

© © 2015 jKool, All Rights Reserved. 5
What is jKool?
A solution to Find and Fix Problems Faster (operational intelligence)
DevOps can use jKool to get real-time diagnostics for entire
applications: logs, metrics and transactions.
– Detect anomalies, 2-clicks to root-cause
– Discover log, transaction topologies
– Analyze app behavior
– Diagnose and determine causality
• An alternative to Splunk or Elasticsearch
– Fraction of the cost of Splunk
– Much easier to use than Elasticsearch

Business Value: Instant Insight
Provide high quality app experiences for customers -
Improve customer satisfaction
Enable DevOps to:
– Fix problems faster
• Faster problem resolution, eliminate false alarms
– Deliver releases sooner
• Less time patching and more time innovating
– Be proactive
• Spot trends and prevent problems

© 2015 jKool, All Rights Reserved 7
Features
• Web-based, mobile-friendly dashboard
– Designed for simplicity and power
• Real-time & historical visualization
– Flexible, user configurable
• Analytics immediately detect outliers
– Aggregation, summarization, comparison, including: count, min,
max, avg., bucketing, filtering and Bollinger
• Ease of use
– Talk to your data using English-like query language
• Scale to handle the largest volumes of data
– NoSQL architecture provides elastic scalability

jKool Does Machine Data
• Sequence, Order, Group, Store
• Relationships
• Compute Timing
• Summarization, comparisons
• Triggers based on continuous queries (CEP)
– Subscribe to events min elapsedtime, avg elapsedtime, max
elapsedtime where eventname="Buy" show as linechart

Real-time, In-Memory
Analytics
jKool Analyzes
Time-Series Data

Technology
• Elastic Architecture
– Linear scalability – Highly
extensible
– Fast, in-memory analysis
• Open Source
– NoSQL DB, tools and
instrumentation
– No schema to maintain
• FatPipes
– Micro-services for ultimate
flexibility, change and configuration
RESTful

Key to Real-time Analytics
• Process streams as they come while at the same time
avoiding IO
– Streams are split into real-time queue and persistence queue
with eventual consistency
• Both have to be processed in parallel
– Writing to persistence layer and then analyzing will not achieve
near real-time processing

Why clustered computing platforms?
• STORM paired with Kafka/JMS and CEP
– Clustered way to process incoming real-time streams
• STORM handles clustering/distribution
• Kafka/JMS for a messaging between grids
– Split streaming workload across the cluster
– Achieve linear scalability for incoming real-time streams
• Apache Spark (alternative to MapReduce)
– For distributing queries and trend analysis
– Micro batching for historical analytics
– Loading large dataset into memory (across different nodes)
– Running queries against large data-sets

Web Interface: DevOps Application Owner
13© 2015 jKool, All Rights Reserved

Challenges: Meeting our Objectives
• Store everything, analyze everything…
• Combined real-time & historical analytics
• Fast response, flexible query capabilities
– Target – for business user
– Insulate us from underlying software
– Hide complexity
• Scale for ingesting data-in-motion
• Scale for storing data-at-rest
• Elasticity & Operational efficiency
• Ease of monitoring & management

Challenges: What we experienced
• So many technology options (…so little time…)
– Deciding on the right combination is key early on
• Cassandra/Solr deployment — (it was a learning experience for us)
– Lots of configuration, memory management, replication options
• Monitoring, managing clusters
– Cassandra/Solr, STORM, Zookeeper, Messaging
– +Leverage parent company’s AutoPilot Technology
• Achieving near real-time analytics proved
extremely challenging – but we did it!
– Keeping track of latencies across cluster
– Estimating computational capacity required to crunch incoming
streams

Challenges: DB was the bottleneck
• Needed high performance DB platform
• SQL (Oracle, MySQL, etc.)
– No scale. We have had a lot of experience our customer’s issues with
this at our parent company Nastel…
– RAM was “the” bottleneck. Commits take too long and while that is
happening everything else stops
• NoSQL
– Cassandra/Solr (DSE)
– Hadoop/MapReduce
– MongoDB
• Clustered Computing Platforms
– STORM
– MapReduce
– Spark (we learned about this while building jKool)

Why we chose Cassandra/Solr?
• Pros:
– Simple to setup & scale for clustered deployments
– Scalable, resilient, fault-tolerant (easy replication)
– Ability to have data automatically expire (TTL – necessary for our pricing model)
– Configurable replication strategy
– Great for heavy write workloads
• Write performance was better than Hadoop.
• Insert rate was of paramount importance for us – get data in as fast as possible was our goal
• Java driver balances the load amongst the nodes in a cluster for us (master-slave would never have
worked for us)
– Solr provides a way to index all incoming data - essential
– DSE provides a nice integration between Cassandra and Solr
• Cons:
– Susceptible to GC pauses (memory management)
• The more memory the more GC pauses
• Less memory and more nodes seems a better approach than one big “honking” server (we see 6-8GB
optimal, so far)
– Data compaction tasks may hang
© © 2015 jKool, All Rights Reserved 17

Why not Hadoop MapReduce?
• MapReduce too slow for real-time workloads
– Ok for batch, not so great for real-time
– Need to be paired with other technologies for query (Hive/Pig)
– Complex to setup, run and operate
• Our goals were simplicity first…
• Opted for STORM/Spark wrapped with our own micro
services platform FatPipes instead of the Map Reduce
functionality

Why we chose Cassandra/Solr vs. Mongo?
• Why not Mongo?
– Global write-lock performance concerns…
• Cassandra/Solr
– Java based (our project was in Java)
– Easy to scale, replicate data,
– Flexible write & write consistency levels (ALL, QUORUM, ANY, etc.)
– Did we say Java? Yes.(we like Java…)
• Flexible choice of platform coverage
– Great for time-series data streams (market focus for jKool)
• Inherent query limitations in Cassandra solved via Solr
integration (provided with DSE – as mentioned earlier)

What we learned
• Consider your application
– Read heavy or write heavy? Both?
• Evaluate performance of course, but consider the user
– We needed simplicity: setup and scale (us and end user)
– We needed reliability – not planning on targeting data engineers
– We needed auto pruning (TTL)
– We needed easy search
• DSE had this…the others did not provide all of this
– We choose DSE.

jKool in Real Time – A Live Demo

Thank you!
Input questions at any time
using the online interface
More information on jKool at: jKoolCloud.com

How jKool Analyzes Streaming Data in Real Time with DataStax

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie How jKool Analyzes Streaming Data in Real Time with DataStax

Ähnlich wie How jKool Analyzes Streaming Data in Real Time with DataStax (20)

Mehr von DataStax

Mehr von DataStax (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

How jKool Analyzes Streaming Data in Real Time with DataStax

Hinweis der Redaktion