SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Data Collaboration Stack:
From DataOps to MLOps
Pierre Brunelle, 2022 Rev3 - NYC @pjlbrunelle
|
Cloud-based technologies centered on data to empower users to explore and use data…
1 Ideal Modern Data Stack
Building Blocks & Best-of-Breed Approach
… BI
[Reverse]
E(L)TL
Workspace
No-code
Catalog &
Governance
Modeling
Warehouse,
Lake, &
Mesh...
Spreadsheet …
Feature Metrics
|
2 Reality Check: Modern Data Stack
|
4 Most painful issues when interacting with data, by order of priority
Data Quality Issues
Difficulty accessing data and insufficient quantity
Explainability
Lack of ETL Automation / Data Warehousing Issues
Convincing Stakeholders
Reproducibility
Insufficient Hardware
Unsure of best approach or technique to use…?
Need to be able to iterate quickly
|
5 What the Data Collaboration Stack addresses:
Data Quality Issues
Difficulty accessing data and insufficient quantity
Explainability
Lack of ETL Automation / Data Warehousing Issues
Convincing Stakeholders
Reproducibility
Insufficient Hardware
Unsure of best approach or technique to use…?
Need to be able to iterate quickly
|
4 Head Full of Fresh Ops: Smooth out the Data Workflows and Processes
|
4 A Simplified Data Science Workflow
Feature Engineering
Preparation
Selection
Modeling
Data Cleaning and
Labeling
Data Collection
Optimization
Ensembling
Validation
Improvement
Monitoring
Deployment
Productionization
Code is merely 5-10% of any machine learning solution.
|
4 Addressing the Skill Gap for Data Science through Collaboration
8
Analytics &
Visualization
Statistics &
Mathematics
Computer
Science
Domain
Expertise
Machine
Learning
Analyst Data Scientist Engineer Researcher PM/Business
|
4 A Typical Data Science Project
https://arxiv.org/pdf/2001.06684.pdf
|
4 Collaboration at Amazon Core AI (Amazon Artificial Intelligence Group)
● Price Elasticities
● Economic Impact of Abusing Behavior
● Deep learning to describe products
and services
● Debiasing techniques
● Demand across geography to minimize
transportation costs
● “Image” scanning with Optical
Character Recognition (OCR)
● Multi-arm bandit algorithm to improve
predicted revenue
● Methods for inventory management
Data
Engineers
Product
Customer
Decision
Makers
Partners
Legal
Economists
Data
Scientists
Data
Workflow
|
4 Data-Centric AI (Garbage I/O) and Bayesian Networks require Collaboration
● A visualization of the structure of the
model and motivate the design of new
models.
● Insights into the presence and
absence of the relationships between
random variables.
● A way to structure complex
probability calculations.
● What are the random variables in
the problem?
● What are the conditional
relationships between the variables?
● What are the probability
distributions for each variable?
Subject-Matter Experts (SMEs) are integral
to the development process.
Provides Requires
|
4 Collaboration is required at every single step
Data (science) teams are extremely
collaborative and work with a variety
of stakeholders and tools
|
5 Examples: How Does Collaboration Take Place?
Data Scientist: Having members of the same team work simultaneously on the same
notebook document
Finance Analyst: Having versioned reports that can be re-usable by others
Data Engineer: Having running job status to be communicated to many stakeholders
and shareable
Data Scientist: Having a notion of ownership around artifacts (data, code, and models)
Data Scientist: Having the ability to rapidly clone and reproduce experiments
ML Engineer: Having the ability to search, browse, and organize code, data, and models
Collaboration as Simple Rules
Pierre Brunelle, 2022 Rev3 - NYC
|
4 From Data To Wisdom
Any Data Workflow…
Gather
Clean
Transform
Explore
Represent
Prescribe
Present
Decide
Data Information Knowledge Insight Wisdom
|
4 Collaborative Data Workflows
● Data Engineering
● Data Analytics
● Data Science
● Data Visualization
Collaboration in…
Gather
Clean
Transform
Explore
Represent
Prescribe
Present
Decide
Data
Workflow
|
4 Collaborative Data Ecosystem
Team B
Team C
Team A
Team D
Team E
Team F
Maintainers Producers Consumers @kafonek
|
4 Pierre’s Collaborative Modern Data Stack
● Discover Data
● Share Across
● Secure Governance
● Control Workflows
● Personalized Views
Eliminate Data Silos
Infrastructure
Infrastructure
Infrastructure
Storage, Access,
& Transformation
Management,
Governance, &
Observability
Infrastructure
Explore, Analyze, &
Publish
|
5 Collaboration as Simple Rules
● Import & Export
● Search & Navigation
● Annotation (e.g. Comment, Tagging…)
● User Segmentation
● Support (at least) asynchronous teamwork
● Content Management & Sharing (e.g. Version Control, Change…)
Key Elements
|
5 What Modern Data Stack is it?
Infrastructure
Storage, Access, & Transformation
Management, Governance, &
Observability
Explore, Analyze, & Publish
|
5 Example: Pierre’s Online E-Commerce Modern Data Stack
|
5 Want to read about Data Collaboration…
“Companies that are in control of their own data generation are those who can get the quickest benefit out
of that data collaboration” - Blake Burch, CEO at Shipyard
“tools empowering data collaboration would come in handy.” - Eti Gwirtz, VP Product at GigaSpaces.
“readiness to experiment and engaging with multiple stakeholders across the organization with specific
roles but ones that need collaboration” - Akhilesh Ayer, EVP and Global Head at WNS Triange.
“It isn’t so much a matter of which industries stand to gain from data collaboration, but that most
businesses can optimize their performance and accuracy by embracing data collaboration” - James
Shalhoub, CEO at Finn
Organizations can solve these challenges by improving cross-functional collaboration between team leaders
and their data team to make insights accessible to the broader team while also shining a light on the most
important metrics to analyze” - Ryan G. Smith, CEO at LeafLink
“For companies with one data person, the collaboration is happening with non-data people, so more of the
data collaboration would likely be around communications of the insights and actions that need to be taken.
Whereas in a technical organization, data collaboration may mean that team members are sharing a GitHub
account and sharing code, as well as putting the code through a review process. The data professionals in
these two instances have very different challenges to face” - Emad Hasan, CEO at Retina
Q&A
@pjlbrunelle
Pierre Brunelle, 2022 Rev3 - NYC

Weitere ähnliche Inhalte

Was ist angesagt?

Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Building an integrated data strategy
Building an integrated data strategyBuilding an integrated data strategy
Building an integrated data strategyLucas Modesto
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity ModelsAlan McSweeney
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
RWDG Slides: Building a Data Governance Roadmap
RWDG Slides: Building a Data Governance RoadmapRWDG Slides: Building a Data Governance Roadmap
RWDG Slides: Building a Data Governance RoadmapDATAVERSITY
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...DATAVERSITY
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Data catalog
Data catalogData catalog
Data catalogiamtodor
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Graph Databases – Benefits and Risks
Graph Databases – Benefits and RisksGraph Databases – Benefits and Risks
Graph Databases – Benefits and RisksDATAVERSITY
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j
 

Was ist angesagt? (20)

Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Building an integrated data strategy
Building an integrated data strategyBuilding an integrated data strategy
Building an integrated data strategy
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity Models
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
RWDG Slides: Building a Data Governance Roadmap
RWDG Slides: Building a Data Governance RoadmapRWDG Slides: Building a Data Governance Roadmap
RWDG Slides: Building a Data Governance Roadmap
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Data catalog
Data catalogData catalog
Data catalog
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Graph Databases – Benefits and Risks
Graph Databases – Benefits and RisksGraph Databases – Benefits and Risks
Graph Databases – Benefits and Risks
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
 

Ähnlich wie Data Collaboration Stack

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Papershashanksalunkhe12
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingDATAVERSITY
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData Blueprint
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Dell World
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesDATAVERSITY
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Denodo
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxExplorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxwindu19
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMONeo4j
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataDATAVERSITY
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData Blueprint
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17Thinkful
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 

Ähnlich wie Data Collaboration Stack (20)

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxExplorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 

Kürzlich hochgeladen

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 

Kürzlich hochgeladen (20)

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 

Data Collaboration Stack

  • 1. Data Collaboration Stack: From DataOps to MLOps Pierre Brunelle, 2022 Rev3 - NYC @pjlbrunelle
  • 2. | Cloud-based technologies centered on data to empower users to explore and use data… 1 Ideal Modern Data Stack Building Blocks & Best-of-Breed Approach … BI [Reverse] E(L)TL Workspace No-code Catalog & Governance Modeling Warehouse, Lake, & Mesh... Spreadsheet … Feature Metrics
  • 3. | 2 Reality Check: Modern Data Stack
  • 4. | 4 Most painful issues when interacting with data, by order of priority Data Quality Issues Difficulty accessing data and insufficient quantity Explainability Lack of ETL Automation / Data Warehousing Issues Convincing Stakeholders Reproducibility Insufficient Hardware Unsure of best approach or technique to use…? Need to be able to iterate quickly
  • 5. | 5 What the Data Collaboration Stack addresses: Data Quality Issues Difficulty accessing data and insufficient quantity Explainability Lack of ETL Automation / Data Warehousing Issues Convincing Stakeholders Reproducibility Insufficient Hardware Unsure of best approach or technique to use…? Need to be able to iterate quickly
  • 6. | 4 Head Full of Fresh Ops: Smooth out the Data Workflows and Processes
  • 7. | 4 A Simplified Data Science Workflow Feature Engineering Preparation Selection Modeling Data Cleaning and Labeling Data Collection Optimization Ensembling Validation Improvement Monitoring Deployment Productionization Code is merely 5-10% of any machine learning solution.
  • 8. | 4 Addressing the Skill Gap for Data Science through Collaboration 8 Analytics & Visualization Statistics & Mathematics Computer Science Domain Expertise Machine Learning Analyst Data Scientist Engineer Researcher PM/Business
  • 9. | 4 A Typical Data Science Project https://arxiv.org/pdf/2001.06684.pdf
  • 10. | 4 Collaboration at Amazon Core AI (Amazon Artificial Intelligence Group) ● Price Elasticities ● Economic Impact of Abusing Behavior ● Deep learning to describe products and services ● Debiasing techniques ● Demand across geography to minimize transportation costs ● “Image” scanning with Optical Character Recognition (OCR) ● Multi-arm bandit algorithm to improve predicted revenue ● Methods for inventory management Data Engineers Product Customer Decision Makers Partners Legal Economists Data Scientists Data Workflow
  • 11. | 4 Data-Centric AI (Garbage I/O) and Bayesian Networks require Collaboration ● A visualization of the structure of the model and motivate the design of new models. ● Insights into the presence and absence of the relationships between random variables. ● A way to structure complex probability calculations. ● What are the random variables in the problem? ● What are the conditional relationships between the variables? ● What are the probability distributions for each variable? Subject-Matter Experts (SMEs) are integral to the development process. Provides Requires
  • 12. | 4 Collaboration is required at every single step Data (science) teams are extremely collaborative and work with a variety of stakeholders and tools
  • 13. | 5 Examples: How Does Collaboration Take Place? Data Scientist: Having members of the same team work simultaneously on the same notebook document Finance Analyst: Having versioned reports that can be re-usable by others Data Engineer: Having running job status to be communicated to many stakeholders and shareable Data Scientist: Having a notion of ownership around artifacts (data, code, and models) Data Scientist: Having the ability to rapidly clone and reproduce experiments ML Engineer: Having the ability to search, browse, and organize code, data, and models
  • 14. Collaboration as Simple Rules Pierre Brunelle, 2022 Rev3 - NYC
  • 15. | 4 From Data To Wisdom Any Data Workflow… Gather Clean Transform Explore Represent Prescribe Present Decide Data Information Knowledge Insight Wisdom
  • 16. | 4 Collaborative Data Workflows ● Data Engineering ● Data Analytics ● Data Science ● Data Visualization Collaboration in… Gather Clean Transform Explore Represent Prescribe Present Decide Data Workflow
  • 17. | 4 Collaborative Data Ecosystem Team B Team C Team A Team D Team E Team F Maintainers Producers Consumers @kafonek
  • 18. | 4 Pierre’s Collaborative Modern Data Stack ● Discover Data ● Share Across ● Secure Governance ● Control Workflows ● Personalized Views Eliminate Data Silos Infrastructure Infrastructure Infrastructure Storage, Access, & Transformation Management, Governance, & Observability Infrastructure Explore, Analyze, & Publish
  • 19. | 5 Collaboration as Simple Rules ● Import & Export ● Search & Navigation ● Annotation (e.g. Comment, Tagging…) ● User Segmentation ● Support (at least) asynchronous teamwork ● Content Management & Sharing (e.g. Version Control, Change…) Key Elements
  • 20. | 5 What Modern Data Stack is it? Infrastructure Storage, Access, & Transformation Management, Governance, & Observability Explore, Analyze, & Publish
  • 21. | 5 Example: Pierre’s Online E-Commerce Modern Data Stack
  • 22. | 5 Want to read about Data Collaboration… “Companies that are in control of their own data generation are those who can get the quickest benefit out of that data collaboration” - Blake Burch, CEO at Shipyard “tools empowering data collaboration would come in handy.” - Eti Gwirtz, VP Product at GigaSpaces. “readiness to experiment and engaging with multiple stakeholders across the organization with specific roles but ones that need collaboration” - Akhilesh Ayer, EVP and Global Head at WNS Triange. “It isn’t so much a matter of which industries stand to gain from data collaboration, but that most businesses can optimize their performance and accuracy by embracing data collaboration” - James Shalhoub, CEO at Finn Organizations can solve these challenges by improving cross-functional collaboration between team leaders and their data team to make insights accessible to the broader team while also shining a light on the most important metrics to analyze” - Ryan G. Smith, CEO at LeafLink “For companies with one data person, the collaboration is happening with non-data people, so more of the data collaboration would likely be around communications of the insights and actions that need to be taken. Whereas in a technical organization, data collaboration may mean that team members are sharing a GitHub account and sharing code, as well as putting the code through a review process. The data professionals in these two instances have very different challenges to face” - Emad Hasan, CEO at Retina

Hinweis der Redaktion

  1. Data Ops: Data + DevOps - A set of practices to improve the quality and reduce the cycle time of data analytics. The main tasks in DataOps include data tagging, data testing, data pipeline orchestration, data versioning and data monitoring. ML Ops: ML + DevOps - A set of practices to design, build and manage reproducible, testable and sustainable ML-powered software AI Ops: AI + DevOps
  2. Including SMEs who actually understand how to label and curate your data in the loop allows data scientists to inject domain expertise directly into the model. Once done, this expert knowledge can be codified and deployed for programmatic supervision.
  3. Efficiency Increase Time Savings Reproducibility Community Building Onboarding
  4. 70% of respondents to a recent Harvard Business Review survey acknowledged they were not very effective at data sharing.1 Organizations that share data externally with their partners generate three times more measurable economic benefits than their counterparts that do not.2