Sarah: CEO-Finance-Report pipeline seems to be slow today. Why
Jeeves: SparkSQL query dbt_fin_model in CEO-Finance-Report is running 53% slower on 2/28/2021. Data skew issue detected. Issue has not been seen in last 90 days.
Jeeves: Adding 5 more nodes to cluster recommended for CEO-Finance-Report to finish in its 99th percentile time of 5.2 hours.
Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way conversation with their own personal Spark expert.
We presented Jeeves at Spark Summit 2019. In the two years since, Jeeves has grown up a lot. Jeeves can now learn continuously as telemetry information streams in from more and more applications, especially SQL queries. Jeeves now “knows” about data pipelines that have many components. Jeeves can also answer questions about data quality in addition to performance, cost, failures, and SLAs. For example:
Tom: I am not seeing any data for today in my Campaign Metrics Dashboard.
Jeeves: 3/5 validations failed on the cmp_kpis table on 2/28/2021. Run of pipeline cmp_incremental_daily failed on 2/28/2021.
This talk will give an overview of the newer capabilities of the chatbot, and how it now fits in a modern data stack with the emergence of new data roles like analytics engineers and machine learning engineers. You will learn how to build chatbots that tackle your complex data operations challenges.
Jeeves Grows Up: An AI Chatbot for Performance and Quality
1. 1
Jeeves Grows Up:
An AI Chatbot for
Performance and Quality
Shivnath Babu
CTO/Cofounder @ Unravel
Adjunct Professor @ Duke University
TRUSTED BY
2. 2
About the speaker
Shivnath Babu
Cofounder/CTO at Unravel
Adjunct Professor of Computer Science at Duke
University
Focusing on manageability of data pipelines and
the modern data stack
Recipient of US National Science Foundation
CAREER Award, IBM Faculty Award, HP Labs
Innovation Research Award
3. 3
Unravel radically simplifies DataOps & has
strong adoption across platforms & industries
• Brings together
information about all
your apps, clusters,
resource utilization,
users, & datasets in a
single place
• Creates end-to-end view
of data pipelines to easily
track & understand issues
• Tracks & reports on usage
across environments
• Checks for & alerts on
anomalous behavior
• Uses AI/ML to troubleshoot &
optimize apps to meet desired
performance & cost needs
• Spots & fixes inefficient usage
• Ensures efficiency, quality, &
performance of all apps in
development & production
6. 6
6
#UnifiedAnalytics #SparkAISummit
“I have no clue
which cloud
instance type to
pick for my
workload”
“My cloud
costs are
getting out of
control. Help!”
“I have no
idea why
my app is
slow”
“My app
failed and I
don’t know
why!”
The UNhappy Spark user
7. 7
• Many levels of dependent stack traces
• Identifying the root cause is hard and time consuming
7
Typical app failure in Spark
#UnifiedAnalytics #SparkAISummit
8. 8
8
#UnifiedAnalytics #SparkAISummit
“My app
failed and I
don’t know
why!”
Chatbot
“I know that sucks! Let me take a
look here …”
“I see the problem. Executors
are running out of memory”
“Setting spark.executor.memory
to 12g fixes the problem. I have
verified it. See this run here”
“Wow.
Thanks. You
are
awesome!”
Spark User
14. 14
Most companies have 10+
mission-critical Data Pipelines
Data Pipelines
Data Stack for these pipelines
is multi-system & complex
Data Stack
Now every company is a data company
20. 20
Most companies have 10+
mission-critical Data Pipelines
Data Pipelines
Data Stack for these pipelines
is multi-system & complex
Data Stack
33% & growing # of data teams
follow a DataOps practice
DataOps
Now every company is a data company
21. 21
SLA misses
are creating
problems
We asked 200+ companies how they
manage their data pipelines
We only
detect the fire
after it starts!
Our pipeline
schedules
are all
messed up!
We need
CI/CD for our
pipelines
Fixing
problems
takes weeks
Users are
always
complaining
I am wasting
most of my
time with
bad data
Do devs ever
#!$ test their
pipelines?
Two failed
attempts to
migrate to
cloud
Cost
reduction is
our #1
priority
22. 22
Effective DataOps practice is required to
solve these problems with data pipelines
SLA misses
are creating
problems
We only
detect the fire
after it starts!
Our pipeline
schedules
are all
messed up!
We need
CI/CD for our
pipelines
Fixing
problems
takes weeks
Users are
always
complaining
I am wasting
most of my
time with
bad data
Do devs ever
#!$ test their
pipelines?
Two failed
attempts to
migrate to
cloud
Cost
reduction is
our #1
priority
23. 23
We created Unravel’s Pipeline Observer to
simplify DataOps
Real-time
Store
Root Cause
Analysis
Service
Baselining
Service
Pipeline
Observer
UI/API
Correlation
Services
Logs
Metrics
Traces
Metadata
Conf
Events
Chatbot
SLA
Tracking UI
Pipeline
Capacity
Planning
Proactive
Alerting
Usage / Cost
Chargeback UI
24. 24
Modern Data Stack composed of:
1. Databricks (Advanced Analytics with Spark)
2. Azure Data Lake Storage (Data Lake)
3. Airflow (Orchestration)
4. dbt (Data Transformation)
5. Great Expectations (Data Quality/Validation)
6. Slack (Chatbot, Team Comm., & Alerting)
7. Unravel (End-to-end Observability)
Demo
Stack
25. 25
1. Pipeline in danger of missing
performance SLA
2. Pipeline in danger of cost overrun
3. Pipeline in danger of breaking due
to data quality problems
Demo
Scenarios
27. 27
In summary
AI-driven DataOps to manage Data Pipelines for the New Data Stack
• Develop & manage data pipelines with ease
• Save time & money
27
Sign up for a free trial, we value your feedback!
https://unraveldata.com/saas-free-trial
We are hiring
shivnath@unraveldata.com