SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Washington DC DataOps
Meetup
www.datakitchen.io
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Topics
Why DataOps Is Essential
Seven Steps to DataOps (+3)
Next Steps With DataOps
Copyright © 2019 by DataKitchen, Inc. All Rights
Reserved.
Data
Analytics
has a
dirty
secret:
FAILURE
Data Analytics is Hot:
• ‘New Oil’ amount of data increasing fast
• Buzz: Big Data, Data Science, Data Lakes, Machine
Learning, AI, etc.
With Rampant Failure:
• 87% of data science projects never get to production.
• Data analytics investment up, yet “data driven”
organizations down 37% to 31%
• 60% of all data analytic projects fail
• 79% of data projects have too many errors
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Challenge: Your Team’s Time Not Well Spent
Percentage
Time Team
Spends Per
Week
Current
Errors & Operational
Tasks
New Features & Data
For Customers
Improvements & Debt
Challenges:
• Many people involved
• Complex toolchains
• Innovative insights cannot be
delivered at the speed of
business
• Collaboration and
coordination across teams,
locations, and clouds/data
centers
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Challenge: Data Analytics is like the US Auto
Industry in the 1970s
Current
High Errors
Production
Errors
Data Analytics
Team
Deployment
Latency
Weeks, Months
Dev Prod
Challenges:
• Slow to add new features,
rapidly address consumer
requests, changing data sets
• Errors in dashboards, reports,
and data pipelines - lack of
trust
• Lacking pipeline execution and
visibility on premise, multi-
cloud, etc.
• Team morale
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
There are lots of people who work with data
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They work in teams
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
With a massive, fragmented toolchain
Data Sources and/or Data Lake
ETL Tools (Informatica, Talend, etc.)
Databases (Redshift, SQL Server, etc.)
Data Science Tools (Python, DataIku, etc.)
Data Catalog Tools (Alation, wiki)
Data Visualization Tools (Tableau, etc)
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They work in teams together
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They work in teams together
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They may work for the same boss
Chief Data Officer
Chief Analytics Officer
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Or not
CIO or CDO
Line of Business
Executives
CEO
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
A many to many dev/ops relationship
Data Specific Production
Team
Do Operations
Themselves
Some other Ops team
“DEV” “OPS”
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
And they source data from internal and
external system
CRM
ERP
Supply Chain
Website
Financial
HR
Open Data
Syndicated
Databases
APIs
Files
Internal and External
Systems/Sources
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They run a ‘Factory’ of Insight
CRM
ERP
Supply Chain
Website
Financial
HR
Open Data
Syndicated
Databases
APIs
Files
Access:
Python Code
Transform:
SQL Code,
ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau
Online
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Have High Errors
DataKitchen/Eckerson Survey (May 2019)
Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Struggle to Deploy
DataKitchen/Eckerson Survey (May 2019)
Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DevOps has resulted in a transformative
improvement in Software Development
• High-performing IT
organizations deploy 200
times more frequently
• They have 24 times faster
recovery times and three
times lower change
failure rates
• And they spend 22
percent less time on
unplanned work and
rework
Source: State of DevOps Report
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Lean has resulted in a transformative
improvement in manufacturing
• Lean manufacturing improves
efficiency, reduces waste, and
increases productivity.
• The benefits are manifold:
• Increased product quality
• Reduces rework
• Employee satisfaction
• Higher profits
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Why are world class, game changing data
analytics so difficult?
Challenges:
• Innovative insights cannot be
delivered at the speed of business
• Data Quality can be compromised
• New innovative technologies are
difficult to test, deploy and
leverage
Dichotomies:
• Develop & protect production
• Experimentation & reproducibility
• Central control & self service
• Group sharing & individual control
• Reuse & branching
Key Observation: Innovation Requires
Iteration, And Iteration Is Hard
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
From To
Change Fear Change Velocity
Manual Operations Automated Operations
Hope For Quality Integrated Quality
Hero Mentality Repeatable Processes
Tool Centric Code Centric
Vendor Lock-In Diverse Tools
How To Succeed?
A Mindset Change to DataOps…
…to power your highly agile data culture.
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps – Transformative to Data Analytics
DataOps – The technical practices,
cultural norms, and architecture that
enable:
• Rapid experimentation and innovation
for the fastest delivery of new insights
to our customers
• Low error rates
• Collaboration across complex sets of
people, technology, and
environments
• Clear measurement and monitoring of
results
Source: Gartner
“Organizations that adopt a DevOps- and DataOps-based
approach are more successful in implementing end-to-end,
reliable, robust, scalable and repeatable solutions.”
Sumit Pal, Gartner, November 2018
People,
Process,
Organization
Technical
Environment
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Topics
Why DataOps Is Essential
Seven Steps to DataOps
Next Steps With DataOps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
What to do? Seven Steps to DataOps (+3)
1. Orchestrate Two Journeys
2. Add Tests And Monitoring
3. Use a Version Control System
4. Branch and Merge
5. Use Multiple Environments
6. Reuse & Containerize
7. Parameterize Your Processing
+ Three (Architecture, Metrics and Inter/Intra Team Collaboration)
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Orchestrate data to customer value
Analytic process are like manufacturing: materials (data) and
production outputs (refined data, charts, graphs, model)
Access:
Python Code
Transform:
SQL Code, ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau Online
❶
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Speed deployment to production
Analytic processes are like software development: deliverables
continually move from development to production
Data
Engineers
Data
Scientists
Data
Analysts
Diverse Team
Diverse Tools
Diverse Customers
Business
Customer
Products &
Systems
❶
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Innovation and Value Pipeline Together
Focus on both orchestration and deployment while automating &
monitoring quality
Don’t want break production
when I deploy my changes
Don’t want to learn about data quality issues from my customers
❶
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Add Automated Monitoring And Tests
Move Fast and Count Things
Monitoring: To ensure that
during in the Value Pipeline, the
data quality remains high.
Tests: Before promoting work,
running new and old tests gives
high confidence that the change did
not break anything in the
Innovation Pipeline
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Automate Monitoring & Tests In Production
Test Every Step And Every Tool in Your Value Pipeline
Are your outputs
consistent?
And Save Test Results!
Are data inputs
free from
issues?
Is your business logic
still correct?
Access:
Python Code
Transform:
SQL Code, ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau Online
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Support Multiple Types Of Tests
Testing Data Is Not Just Pass/Fail in Your Value Pipeline
Support Test Types
• Error – stop the line
• Warning – investigate later
• Info – list of changes
Keep Test History
• Statistical Process Control
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Tests (Basic)
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Tests
Simple
❷
Example Test
More Complex
Make sure all
table counts are
the same in the
production and
development
environment
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Test (Location Balance)
❷
Access:
Python Code
Transform:
SQL Code, ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau Online
source 1 million
rows
database
1 million
rows
300K facts
700K
dimensions
report
300K facts
700K
dimensions
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Test (Historical Balance)
❷
SKU Product Product Group Volume
SKU1 P1
G1
100
SKU2 P1 50
SKU3 P2 75
SKU4 P3
G2
125
SKU5 P4 200
SKU6 P5 25
575
Production
Data, Pipeline
& Environment
Pre-Production
Data, Pipeline
& Environment
SKU Product Product Group Volume
SKU1 P1
G1
101
SKU2 P1 55
SKU3 P2 76
SKU4 P3 126
SKU5 P4
G2
200
SKU6 P5 29
587
Access:
Python
Code
Transfor
m: SQL
Code, ETL
Model:
R Code
Visualiz
e:
Tableau
Workbook
Report
: Tableau
Online
Access:
Python
Code
Transfor
m: SQL
Code, ETL
Model:
R Code
Visualiz
e:
Tableau
Workbook
Report
: Tableau
Online
Histbal
G1 225
G2 350
Histbal
G1 358
G2 229
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Production Testing and Monitoring
Lower Your Error Rates and Embarrassment!
[HOW] Test and Monitor:
• Automatically, in production
• On top of the entire tool chain
• Send alerts / notification
• Keep track of history
• Make it easy to create tests
[WHAT] Test Types:
• Traditional Data Quality
• Statistical Process Control
• Location Balance Test
• Historic Balance Test
• Business Based Tests
[WHY] Benefits:
• Less Errors
• More Innovation
• More Customer Data Trust
• Less Stress
• Less Embarrassment
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
For the Innovation Pipeline
Tests Are For Also Code: Keep Data Fixed
Deploy Feature
Run all tests here before
promoting
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Automated ‘Tests’ Serve
a Dual Purpose:
1. Data Tests and
Monitoring in
Production
2. Regression, Functional
and Performance Tests
in Development
Data Fixed Data Variable
Code Fixed Value Pipeline
Code Variable
Innovation
Pipeline
Quality Your
Customer
Receives
= f (data, code)
https://medium.com/data-ops/disband-your-impact-review-
board-automate-analytics-testing-42093d09fe11
Duality of Tests❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Use a Version Control System
At The End Of The Day, Analytic Work Is All Just Code
Access:
Python Code
Transform:
SQL Code,
ETL Code
Model:
R Code
Visualize:
Tableau
Workbook XML
Report:
Tableau Online
Source Code
Control
❸
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Branch & Merge
Source Code
Control
Branching & Merging enables people to safely work on their own tasks
❹
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Branch And Merge Pattern
Sprint 1 Sprint 2
f1 f2
f3
main / master / trunk
f5
❹
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Access:
Python Code
Transform:
SQL Code,
ETL Code
Model:
R Code
Visualize:
Tableau
Workbook XML
Report:
Tableau Online
Use Multiple Environments
Analytic Environment
Your Analytic Work Requires Coordinating Tools And Hardware
❺
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Use Multiple Environments
Provide an Analytic Environment for each branch
• Analysts need a controlled environment for their experiments
• Engineers need a place to develop outside of production
• Update Production only after all tests are run!
❺
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Sandboxes Are Complex
Analytic Environment
❺
Data Engineers,
Scientists or Analytics Team’s Analytic Tools
R
(model)
Alteryx
(business ETL)
Redshift
(data)
SQL
(ETL)
Hardware & Network
Configurations
Right Hardware and
Software Versions
Tableau
(workbook)
Python
Test Data Sets
Code Branch
Test Result
History
Analytic Environment/
Development Sandbox
Creation is Complex:
Hard to create the
right set of data,
tools, people,
history and
configuration for a
fast build test debug
cycle
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Reuse & Containerize
Containerize
1. Manage the environment for each
component (e.g. Docker, AMI)
2. Practice Environment Version Control
Reuse
1. Do not create one ‘monolith’ of code
2. Reuse the code and results
❻
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Parameterize Your Processing
Think Of Your Value Pipeline Like A Big Function
• Named sets of parameters will
increase your velocity
• With parameters, you can
vary”
• Inputs
• Outputs
• Steps in the workflow
• You can make a time machine
• Secure storage for credentials
❼
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
The Seven Steps In Action
1. Select story
2. Create branch
3. Create environment
4. Implement feature
5. Write new tests
6. Run new and existing tests
7. Check in to branch
8. Merge to parent
9. Delete environment
When sprint ends
• Deliver all completed features to
customer
• Merge sprint branch to master
• Roll un-merged features into the
next sprint
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Three Bonus Steps!
• DataOps Data Architecture
• DataOps Collaboration
• Inter and Intra Team
• DataOps Measurement
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Why a DataOps Centric Architecture?
• Canonical Data Architectures only think about production, not the
process to make changes to production.
• A DataOps Data Architecture makes the steps to change what is
in production a “central idea.”
• Think first about changes over time to your code, your servers,
your tools, and monitoring for errors are first class citizens in the
design.
• Why?
• It is a little like designing a mobile phone with a fixed battery.
• You can end up with processes characterized by unplanned
work, manual deployment, errors, and bureaucracy.
• A ‘Right to Repair’ Data Architecture
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Canonical Data Architecture
Does not reflect collaboration and operations
Production
Environment
Source Data
Data
Customers
Raw
Lake
Data
Engine-
ering
Refined
Data
Data
Science
Data
Viz.
Data
Govern-
ance
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps Data Architecture
Spans tool chain & environments
Cloud/On-Prem
Production
EnvironmentTest
Dev
Source Data
Data
Customers
Raw
Lake
Data
Engine-
ering
Refined
Data
Data
Science
Data
Viz.
Data
Govern-
ance
Orchestrate, Monitor, Test
Orchestrate, Monitor, Test
Orchestrate, Monitor, Test
DataKitchen DataOps
Storage
&
Version
Control
History &
Metadata
Auth &
Permissions
Environ-
ment
Secrets
DataOps
Metrics &
Reports
Automated
Deployment
EnvironmentCreation
andManagement.
DataOps
Team
Chris – DataOps
Engineer
Eric – Production
Engineer
Betty – Data
Engineer
This Is A Multi Step, Multi Person, Multi
Environment Process To Make this Request a
Reality
Challenges:
• How to leverage best practices and re-use?
• How to collaborate and coordinate work?
• How to ease movement between team
members with many tools and environments?
• How to maintain security?
• How to automate work and reduce manual
errors?
DataOps and Intra-Team Coordination
Pat – Data
Scientist
Chris – DataOps
Engineer
Production Environment:
• Separate
Hardware/Software
Environment
• Secure
• No Access By
Developers
• Managed by Eric
• Separate Credentials
PRODUCTION
DEVELOPMENT
Development
Environment:
• Separate
Hardware/Software
Environment
• Secure
• Access By Data
Engineers, Data
Scientists, Analysts and
DataOps Engineers
• Setup by Chris (DataOps
Engineer)
• Separate Credentials
Eric – Production
Engineer
Betty – Data
Engineer
Intra-Team Coordination
Pat – Data
Scientist
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Inter-Team Coordination
Multiple organizations, perhaps multiple bosses
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Inter-Team Coordination: Two Locations,
Multiple Tools
Home Office Team Local Office ‘Self-
Service’ Team
VP Marketing
Data
Engineer
Data
Scientist
Centralized,
Weekly
Cadence of
Changes
Data
Analyst
Distributed,
Daily/Hourly
Cadence of
Changes
Boston
New Jersey
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Challenges With Coordination
Data
Engineer
Data
Scientist
Data
Analyst
Make a change in schema? Break Reports?
Add New Data SetsNot Available For All?
Change Report Calculations Inconsistencies?
New Data & Schema Update/New Report Not Working?
Home Office Team Local Office ‘Self-
Service’ Team
VP Marketing
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Shared Result, Separate Responsibilities
Home Office Team
Data
Engineer
Data
Scientist
Local Office ‘Self-
Service’ Team
Calculate:
SQL
Segment:
Python
Transform
SSIS
Load
SQLServer
Deploy:
Tableau
Publish:
T Server
Add Data
Alteryx
Data
Analyst
VP Marketing
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Overall Orchestration
Home Office Team
Data
Engineer
Data
Scientist
Local Office ‘Self-
Service’ Team
Calculate:
SQL
Segment:
Python
Transform
SSIS
Load
SQLServer
Deploy:
Tableau
Publish:
T Server
Add Data
Alteryx
Data
Analyst
VP Marketing
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps Process Analytics
Analytic teams are not very analytic about
measuring and improving their internal work
• Prove your teams’ value, measure:
• Team and individual productivity
• Production error rates
• Data provider error rates
• SLAs
• Production deployment rates
• Release environments
• Tests Coverage
• Customizable with data export to fit your
company’s needs
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps: data about data
Statistical process control graphs monitoring “bad IDs” and raw row counts
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Error Rates
Decline in
Production
On Time Delivery
within SLA, &
decreasing build
time
Per Project
Analytics
Productivity:
Recipe Work
Increasing
Team
Collaboration
Increased Deploys Between
Environments
Increased
Number of
Automated Tests
Increasing
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Topics
Why DataOps Is Essential
Seven Steps to DataOps (+3)
Next Steps With DataOps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps Benefit: Time Well Spent
After DataOps
Percentage
Time Team
Spends Per
Week
Before DataOps
New Features & Data
For Customers
Errors & Operational
Tasks
New Features & Data
For Customers
Improvements & Debt
Errors & Operational
Tasks
Process Improvements
& Tech Debt Reduction
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps Benefit: Faster, Better & Happier
After DataOpsBefore DataOps
High Errors
Production
Errors Low Errors
Data Analytics
Team
Deployment
Latency
Weeks, Months
Dev Prod
Hours & Mins
Dev Prod
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Write down your answer
What can I take from
this session and apply
tomorrow?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Where to Start With DataOps?
Look to manufacturing/DevOps
‘Theory of Constraints’
• Where are ‘bottlenecks’ (or
constraints in your data science
or analytic process?
• What impedes from creating new
insight for you customers?
• Iterate & improve
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Where to Start
Part 1: [Factory] “I don’t want to learn
about data quality issues from my
customers”
Part 2: [Flow] “I don’t want break
production when I deploy my changes”
Part 3: [Intra-team coordination] “I don’t
want my team to struggle working together”
Part 4: [Inter-team coordination] “I don’t
like the Hatfields vs Mccoys war between
different analytic teams”
Part 5: [Management] “How to measure
team progress with and show results to
leadership?”
Errors : Bottleneck / Constraint
Deployment : Bottleneck / Constraint
Team Coordination : Bottleneck / Constraint
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Where to start
Part 1: [Factory] “I don’t want to learn
about data quality issues from my
customers”
Part 2: [Flow] “I don’t want break
production when I deploy my changes”
Part 3: [Intra-team coordination] “I don’t
want my team to struggle working together”
Part 4: [Inter-team coordination] “I don’t
like the Hatfields vs Mccoys war between
different analytic teams”
Part 5: [Management] “How to measure
team progress with and show results to
leadership?”
Errors, Deployment, and Team
Coordination Are All Bottlenecks That
GOAL: Flow of Innovation
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Software Platform
Our cloud platform orchestrates data to customer value,
speeds features to production, and automates quality.
Kitchens
Recipes & Tests
Orders
Ingredients
1. Orchestrate Two Journeys
2. Add Tests And Monitoring
3. Use a Version Control System
4. Branch and Merge
5. Use Multiple Environments
6. Reuse & Containerize
7. Parameterize Your Processing
+ 3 Bonus Steps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Learn More about DataKitchen & DataOps
• For these slides, contact me:
• cbergh at datakitchen dot io
• DataOps Manifesto:
• http://dataopsmanifesto.org
• Free DataOps Cookbook:
• https://www.datakitchen.io/dataops-
cookbook-main.html
• Excerpt from New Unicorn Project Book on
DataOps
• https://www.datakitchen.io/unicorn-
project.html

Weitere ähnliche Inhalte

Was ist angesagt?

MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
DataOps introduction : DataOps is not only DevOps applied to data!
DataOps introduction : DataOps is not only DevOps applied to data!DataOps introduction : DataOps is not only DevOps applied to data!
DataOps introduction : DataOps is not only DevOps applied to data!Adrien Blind
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation Delphix
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
GitHub Copilot.pptx
GitHub Copilot.pptxGitHub Copilot.pptx
GitHub Copilot.pptxLuis Beltran
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdfAlan McSweeney
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Vadym Kazulkin
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance frameworkkaiyun7631
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 

Was ist angesagt? (20)

adb.pdf
adb.pdfadb.pdf
adb.pdf
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
DataOps introduction : DataOps is not only DevOps applied to data!
DataOps introduction : DataOps is not only DevOps applied to data!DataOps introduction : DataOps is not only DevOps applied to data!
DataOps introduction : DataOps is not only DevOps applied to data!
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
GitHub Copilot.pptx
GitHub Copilot.pptxGitHub Copilot.pptx
GitHub Copilot.pptx
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Screw DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOpsScrew DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOps
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance framework
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 

Ähnlich wie Washington DC DataOps Meetup -- Nov 2019

Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You! DataKitchen
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...DataKitchen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationIntellipaat
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...ModusOptimum
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentationAlison Macfie
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital TransformationVMware Tanzu
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamSenturus
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Elemica
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Softwarerafeq
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics
 

Ähnlich wie Washington DC DataOps Meetup -- Nov 2019 (20)

Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview Preparation
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentation
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
Intro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent SoftwareIntro of Key Features of Soft CAAT Ent Software
Intro of Key Features of Soft CAAT Ent Software
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 

Mehr von DataKitchen

Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile DataDataKitchen
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!DataKitchen
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift IntroductionDataKitchen
 

Mehr von DataKitchen (7)

Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile Data
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift Introduction
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Washington DC DataOps Meetup -- Nov 2019

  • 2. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Topics Why DataOps Is Essential Seven Steps to DataOps (+3) Next Steps With DataOps
  • 3. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Data Analytics has a dirty secret: FAILURE Data Analytics is Hot: • ‘New Oil’ amount of data increasing fast • Buzz: Big Data, Data Science, Data Lakes, Machine Learning, AI, etc. With Rampant Failure: • 87% of data science projects never get to production. • Data analytics investment up, yet “data driven” organizations down 37% to 31% • 60% of all data analytic projects fail • 79% of data projects have too many errors
  • 4. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Challenge: Your Team’s Time Not Well Spent Percentage Time Team Spends Per Week Current Errors & Operational Tasks New Features & Data For Customers Improvements & Debt Challenges: • Many people involved • Complex toolchains • Innovative insights cannot be delivered at the speed of business • Collaboration and coordination across teams, locations, and clouds/data centers
  • 5. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Challenge: Data Analytics is like the US Auto Industry in the 1970s Current High Errors Production Errors Data Analytics Team Deployment Latency Weeks, Months Dev Prod Challenges: • Slow to add new features, rapidly address consumer requests, changing data sets • Errors in dashboards, reports, and data pipelines - lack of trust • Lacking pipeline execution and visibility on premise, multi- cloud, etc. • Team morale
  • 6. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. There are lots of people who work with data
  • 7. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They work in teams
  • 8. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. With a massive, fragmented toolchain Data Sources and/or Data Lake ETL Tools (Informatica, Talend, etc.) Databases (Redshift, SQL Server, etc.) Data Science Tools (Python, DataIku, etc.) Data Catalog Tools (Alation, wiki) Data Visualization Tools (Tableau, etc)
  • 9. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They work in teams together
  • 10. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They work in teams together
  • 11. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They may work for the same boss Chief Data Officer Chief Analytics Officer
  • 12. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Or not CIO or CDO Line of Business Executives CEO
  • 13. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. A many to many dev/ops relationship Data Specific Production Team Do Operations Themselves Some other Ops team “DEV” “OPS”
  • 14. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. And they source data from internal and external system CRM ERP Supply Chain Website Financial HR Open Data Syndicated Databases APIs Files Internal and External Systems/Sources
  • 15. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They run a ‘Factory’ of Insight CRM ERP Supply Chain Website Financial HR Open Data Syndicated Databases APIs Files Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online
  • 16. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Have High Errors DataKitchen/Eckerson Survey (May 2019) Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
  • 17. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Struggle to Deploy DataKitchen/Eckerson Survey (May 2019) Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
  • 18. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently • They have 24 times faster recovery times and three times lower change failure rates • And they spend 22 percent less time on unplanned work and rework Source: State of DevOps Report
  • 19. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Lean has resulted in a transformative improvement in manufacturing • Lean manufacturing improves efficiency, reduces waste, and increases productivity. • The benefits are manifold: • Increased product quality • Reduces rework • Employee satisfaction • Higher profits
  • 20. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Why are world class, game changing data analytics so difficult? Challenges: • Innovative insights cannot be delivered at the speed of business • Data Quality can be compromised • New innovative technologies are difficult to test, deploy and leverage Dichotomies: • Develop & protect production • Experimentation & reproducibility • Central control & self service • Group sharing & individual control • Reuse & branching Key Observation: Innovation Requires Iteration, And Iteration Is Hard
  • 21. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. From To Change Fear Change Velocity Manual Operations Automated Operations Hope For Quality Integrated Quality Hero Mentality Repeatable Processes Tool Centric Code Centric Vendor Lock-In Diverse Tools How To Succeed? A Mindset Change to DataOps… …to power your highly agile data culture.
  • 22. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps – Transformative to Data Analytics DataOps – The technical practices, cultural norms, and architecture that enable: • Rapid experimentation and innovation for the fastest delivery of new insights to our customers • Low error rates • Collaboration across complex sets of people, technology, and environments • Clear measurement and monitoring of results Source: Gartner “Organizations that adopt a DevOps- and DataOps-based approach are more successful in implementing end-to-end, reliable, robust, scalable and repeatable solutions.” Sumit Pal, Gartner, November 2018 People, Process, Organization Technical Environment
  • 23. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Topics Why DataOps Is Essential Seven Steps to DataOps Next Steps With DataOps
  • 24. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. What to do? Seven Steps to DataOps (+3) 1. Orchestrate Two Journeys 2. Add Tests And Monitoring 3. Use a Version Control System 4. Branch and Merge 5. Use Multiple Environments 6. Reuse & Containerize 7. Parameterize Your Processing + Three (Architecture, Metrics and Inter/Intra Team Collaboration)
  • 25. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Orchestrate data to customer value Analytic process are like manufacturing: materials (data) and production outputs (refined data, charts, graphs, model) Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online ❶
  • 26. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Speed deployment to production Analytic processes are like software development: deliverables continually move from development to production Data Engineers Data Scientists Data Analysts Diverse Team Diverse Tools Diverse Customers Business Customer Products & Systems ❶
  • 27. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Innovation and Value Pipeline Together Focus on both orchestration and deployment while automating & monitoring quality Don’t want break production when I deploy my changes Don’t want to learn about data quality issues from my customers ❶
  • 28. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Add Automated Monitoring And Tests Move Fast and Count Things Monitoring: To ensure that during in the Value Pipeline, the data quality remains high. Tests: Before promoting work, running new and old tests gives high confidence that the change did not break anything in the Innovation Pipeline ❷
  • 29. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Automate Monitoring & Tests In Production Test Every Step And Every Tool in Your Value Pipeline Are your outputs consistent? And Save Test Results! Are data inputs free from issues? Is your business logic still correct? Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online ❷
  • 30. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Support Multiple Types Of Tests Testing Data Is Not Just Pass/Fail in Your Value Pipeline Support Test Types • Error – stop the line • Warning – investigate later • Info – list of changes Keep Test History • Statistical Process Control ❷
  • 31. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Tests (Basic) ❷
  • 32. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Tests Simple ❷
  • 33. Example Test More Complex Make sure all table counts are the same in the production and development environment ❷
  • 34. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Test (Location Balance) ❷ Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online source 1 million rows database 1 million rows 300K facts 700K dimensions report 300K facts 700K dimensions
  • 35. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Test (Historical Balance) ❷ SKU Product Product Group Volume SKU1 P1 G1 100 SKU2 P1 50 SKU3 P2 75 SKU4 P3 G2 125 SKU5 P4 200 SKU6 P5 25 575 Production Data, Pipeline & Environment Pre-Production Data, Pipeline & Environment SKU Product Product Group Volume SKU1 P1 G1 101 SKU2 P1 55 SKU3 P2 76 SKU4 P3 126 SKU5 P4 G2 200 SKU6 P5 29 587 Access: Python Code Transfor m: SQL Code, ETL Model: R Code Visualiz e: Tableau Workbook Report : Tableau Online Access: Python Code Transfor m: SQL Code, ETL Model: R Code Visualiz e: Tableau Workbook Report : Tableau Online Histbal G1 225 G2 350 Histbal G1 358 G2 229
  • 36. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Production Testing and Monitoring Lower Your Error Rates and Embarrassment! [HOW] Test and Monitor: • Automatically, in production • On top of the entire tool chain • Send alerts / notification • Keep track of history • Make it easy to create tests [WHAT] Test Types: • Traditional Data Quality • Statistical Process Control • Location Balance Test • Historic Balance Test • Business Based Tests [WHY] Benefits: • Less Errors • More Innovation • More Customer Data Trust • Less Stress • Less Embarrassment
  • 37. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. For the Innovation Pipeline Tests Are For Also Code: Keep Data Fixed Deploy Feature Run all tests here before promoting ❷
  • 38. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Automated ‘Tests’ Serve a Dual Purpose: 1. Data Tests and Monitoring in Production 2. Regression, Functional and Performance Tests in Development Data Fixed Data Variable Code Fixed Value Pipeline Code Variable Innovation Pipeline Quality Your Customer Receives = f (data, code) https://medium.com/data-ops/disband-your-impact-review- board-automate-analytics-testing-42093d09fe11 Duality of Tests❷
  • 39. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Use a Version Control System At The End Of The Day, Analytic Work Is All Just Code Access: Python Code Transform: SQL Code, ETL Code Model: R Code Visualize: Tableau Workbook XML Report: Tableau Online Source Code Control ❸
  • 40. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Branch & Merge Source Code Control Branching & Merging enables people to safely work on their own tasks ❹
  • 41. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Branch And Merge Pattern Sprint 1 Sprint 2 f1 f2 f3 main / master / trunk f5 ❹
  • 42. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Access: Python Code Transform: SQL Code, ETL Code Model: R Code Visualize: Tableau Workbook XML Report: Tableau Online Use Multiple Environments Analytic Environment Your Analytic Work Requires Coordinating Tools And Hardware ❺
  • 43. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Use Multiple Environments Provide an Analytic Environment for each branch • Analysts need a controlled environment for their experiments • Engineers need a place to develop outside of production • Update Production only after all tests are run! ❺
  • 44. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Sandboxes Are Complex Analytic Environment ❺ Data Engineers, Scientists or Analytics Team’s Analytic Tools R (model) Alteryx (business ETL) Redshift (data) SQL (ETL) Hardware & Network Configurations Right Hardware and Software Versions Tableau (workbook) Python Test Data Sets Code Branch Test Result History Analytic Environment/ Development Sandbox Creation is Complex: Hard to create the right set of data, tools, people, history and configuration for a fast build test debug cycle
  • 45. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Reuse & Containerize Containerize 1. Manage the environment for each component (e.g. Docker, AMI) 2. Practice Environment Version Control Reuse 1. Do not create one ‘monolith’ of code 2. Reuse the code and results ❻
  • 46. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Parameterize Your Processing Think Of Your Value Pipeline Like A Big Function • Named sets of parameters will increase your velocity • With parameters, you can vary” • Inputs • Outputs • Steps in the workflow • You can make a time machine • Secure storage for credentials ❼
  • 47. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. The Seven Steps In Action 1. Select story 2. Create branch 3. Create environment 4. Implement feature 5. Write new tests 6. Run new and existing tests 7. Check in to branch 8. Merge to parent 9. Delete environment When sprint ends • Deliver all completed features to customer • Merge sprint branch to master • Roll un-merged features into the next sprint
  • 48. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Three Bonus Steps! • DataOps Data Architecture • DataOps Collaboration • Inter and Intra Team • DataOps Measurement
  • 49. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Why a DataOps Centric Architecture? • Canonical Data Architectures only think about production, not the process to make changes to production. • A DataOps Data Architecture makes the steps to change what is in production a “central idea.” • Think first about changes over time to your code, your servers, your tools, and monitoring for errors are first class citizens in the design. • Why? • It is a little like designing a mobile phone with a fixed battery. • You can end up with processes characterized by unplanned work, manual deployment, errors, and bureaucracy. • A ‘Right to Repair’ Data Architecture
  • 50. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Canonical Data Architecture Does not reflect collaboration and operations Production Environment Source Data Data Customers Raw Lake Data Engine- ering Refined Data Data Science Data Viz. Data Govern- ance
  • 51. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps Data Architecture Spans tool chain & environments Cloud/On-Prem Production EnvironmentTest Dev Source Data Data Customers Raw Lake Data Engine- ering Refined Data Data Science Data Viz. Data Govern- ance Orchestrate, Monitor, Test Orchestrate, Monitor, Test Orchestrate, Monitor, Test DataKitchen DataOps Storage & Version Control History & Metadata Auth & Permissions Environ- ment Secrets DataOps Metrics & Reports Automated Deployment EnvironmentCreation andManagement. DataOps Team
  • 52. Chris – DataOps Engineer Eric – Production Engineer Betty – Data Engineer This Is A Multi Step, Multi Person, Multi Environment Process To Make this Request a Reality Challenges: • How to leverage best practices and re-use? • How to collaborate and coordinate work? • How to ease movement between team members with many tools and environments? • How to maintain security? • How to automate work and reduce manual errors? DataOps and Intra-Team Coordination Pat – Data Scientist
  • 53. Chris – DataOps Engineer Production Environment: • Separate Hardware/Software Environment • Secure • No Access By Developers • Managed by Eric • Separate Credentials PRODUCTION DEVELOPMENT Development Environment: • Separate Hardware/Software Environment • Secure • Access By Data Engineers, Data Scientists, Analysts and DataOps Engineers • Setup by Chris (DataOps Engineer) • Separate Credentials Eric – Production Engineer Betty – Data Engineer Intra-Team Coordination Pat – Data Scientist
  • 54. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Inter-Team Coordination Multiple organizations, perhaps multiple bosses
  • 55. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Inter-Team Coordination: Two Locations, Multiple Tools Home Office Team Local Office ‘Self- Service’ Team VP Marketing Data Engineer Data Scientist Centralized, Weekly Cadence of Changes Data Analyst Distributed, Daily/Hourly Cadence of Changes Boston New Jersey
  • 56. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Challenges With Coordination Data Engineer Data Scientist Data Analyst Make a change in schema? Break Reports? Add New Data SetsNot Available For All? Change Report Calculations Inconsistencies? New Data & Schema Update/New Report Not Working? Home Office Team Local Office ‘Self- Service’ Team VP Marketing
  • 57. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Shared Result, Separate Responsibilities Home Office Team Data Engineer Data Scientist Local Office ‘Self- Service’ Team Calculate: SQL Segment: Python Transform SSIS Load SQLServer Deploy: Tableau Publish: T Server Add Data Alteryx Data Analyst VP Marketing
  • 58. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Overall Orchestration Home Office Team Data Engineer Data Scientist Local Office ‘Self- Service’ Team Calculate: SQL Segment: Python Transform SSIS Load SQLServer Deploy: Tableau Publish: T Server Add Data Alteryx Data Analyst VP Marketing
  • 59. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps Process Analytics Analytic teams are not very analytic about measuring and improving their internal work • Prove your teams’ value, measure: • Team and individual productivity • Production error rates • Data provider error rates • SLAs • Production deployment rates • Release environments • Tests Coverage • Customizable with data export to fit your company’s needs
  • 60. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps: data about data Statistical process control graphs monitoring “bad IDs” and raw row counts
  • 61. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Error Rates Decline in Production On Time Delivery within SLA, & decreasing build time Per Project Analytics Productivity: Recipe Work Increasing Team Collaboration Increased Deploys Between Environments Increased Number of Automated Tests Increasing
  • 62. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Topics Why DataOps Is Essential Seven Steps to DataOps (+3) Next Steps With DataOps
  • 63. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps Benefit: Time Well Spent After DataOps Percentage Time Team Spends Per Week Before DataOps New Features & Data For Customers Errors & Operational Tasks New Features & Data For Customers Improvements & Debt Errors & Operational Tasks Process Improvements & Tech Debt Reduction
  • 64. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps Benefit: Faster, Better & Happier After DataOpsBefore DataOps High Errors Production Errors Low Errors Data Analytics Team Deployment Latency Weeks, Months Dev Prod Hours & Mins Dev Prod
  • 65. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Write down your answer What can I take from this session and apply tomorrow?
  • 66. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Where to Start With DataOps? Look to manufacturing/DevOps ‘Theory of Constraints’ • Where are ‘bottlenecks’ (or constraints in your data science or analytic process? • What impedes from creating new insight for you customers? • Iterate & improve
  • 67. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Where to Start Part 1: [Factory] “I don’t want to learn about data quality issues from my customers” Part 2: [Flow] “I don’t want break production when I deploy my changes” Part 3: [Intra-team coordination] “I don’t want my team to struggle working together” Part 4: [Inter-team coordination] “I don’t like the Hatfields vs Mccoys war between different analytic teams” Part 5: [Management] “How to measure team progress with and show results to leadership?” Errors : Bottleneck / Constraint Deployment : Bottleneck / Constraint Team Coordination : Bottleneck / Constraint
  • 68. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Where to start Part 1: [Factory] “I don’t want to learn about data quality issues from my customers” Part 2: [Flow] “I don’t want break production when I deploy my changes” Part 3: [Intra-team coordination] “I don’t want my team to struggle working together” Part 4: [Inter-team coordination] “I don’t like the Hatfields vs Mccoys war between different analytic teams” Part 5: [Management] “How to measure team progress with and show results to leadership?” Errors, Deployment, and Team Coordination Are All Bottlenecks That GOAL: Flow of Innovation
  • 69. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataKitchen Software Platform Our cloud platform orchestrates data to customer value, speeds features to production, and automates quality. Kitchens Recipes & Tests Orders Ingredients 1. Orchestrate Two Journeys 2. Add Tests And Monitoring 3. Use a Version Control System 4. Branch and Merge 5. Use Multiple Environments 6. Reuse & Containerize 7. Parameterize Your Processing + 3 Bonus Steps
  • 70. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Learn More about DataKitchen & DataOps • For these slides, contact me: • cbergh at datakitchen dot io • DataOps Manifesto: • http://dataopsmanifesto.org • Free DataOps Cookbook: • https://www.datakitchen.io/dataops- cookbook-main.html • Excerpt from New Unicorn Project Book on DataOps • https://www.datakitchen.io/unicorn- project.html