SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Democratizing Data
Architecting Terabyte Common Data Models and
Configuration Driven Pipelines For AI Platforms
Cindy Mottershead, AI Architect, Blackbaud
Shiran Algai, Senior Manager of Software Development, Blackbaud
Agenda
Shiran Algai
▪ Problem Statement
▪ Architecture Journey
Cindy Mottershead
▪ Architecture decisions
▪ Common Data Model
▪ Configuration Driven Pipeline
▪ Transformation building blocks
▪ AI Feedback Loop
We are the world’s
leading cloud software
company powering
social good.
Millions of users in
over 100 countries
The world’s 18th largest
SaaS applications provider*
Fortune 56 Companies
Changing the World*
*2017
Problem
▪ Data is very siloed
▪ Similar entities are described entirely differently by every product
▪ Bringing on new sources continues to compound the issue
▪ Data is frequently entered slightly differently for same entity
▪ Engineering teams are unable to leverage data to drive insight and
help our customers solve their problems.
▪ AI ETL cycle far too long
▪ AI ability to explore data extremely limited
First Steps
▪ Had beginnings of a few data lake projects, but scattered a bit
throughout organization
▪ Built consensus and momentum toward a common delta lake
▪ Started on MS tooling in Azure (data factory, U-SQL run by data lake
analytics jobs, etc.)
▪ Leverage as many Azure PaaS tools as possible
▪ Batch only
▪ Picked a small "bore hole through the mountain" approach
First Steps
Pivoting
▪ Painful adding new readers for different
sources not natively supported (Avro, Parquet)
▪ Gaps in Azure data tooling for our specific use
cases
▪ Desire for batch AND streaming through a
similar path
▪ Need the ability to compact records,
recreating legacy datasets in the platform
▪ Ability to hire data engineers in the market
easily
Solution
Data Platform Ecosystem
▪ Delta Lake
▪ Azure Data Lake Store
▪ Data Catalog Service
▪ Lake Authorization Service
▪ Ingestion Service
▪ Output service
▪ Async messaging contract
broker service
Service A
Service B
Data Catalog
Uses ACB as a source
for new catalog entries
Async Contract Broker
Service
Stores message schemas
Prevents breaking schema
changes
Ingestion Service
Automatically
subscribes to new and
existing topics
…
82
more
Lake
Staging Zone Raw Zone
Compacted daily
Trusted Zone
CDM tables
Service Bus
Topic
Common Data Models
▪ Downstream services + Data Scientists all leverage same common
models, accelerating development
▪ Common defined structure
▪ Consistent Naming of tables, structures, fields
▪ Consistent across all applications and application types
▪ Manage multiple data sources
▪ Remove complexities & specifics of source systems
▪ Shows the data “As is” (natural values)
▪ Provides common groupings & coding of data values (derived values)
▪ Integrated with Value-Added Services
CDM_Person
Common Data Model Input
▪ Thousands of relational tables
▪ Csv, json, parquet, avro, etc formatted input files
▪ Normalized and denormalized input
▪ Nested objects
▪ SQL Server, Mariadb, Oracle, flat files
▪ Change events
Configuration Driven Pipeline
▪ Common Id
▪ Metadata Map
▪ Pipeline
▪ Transformations
Transformation building
blocks
a) Filters
b) View
c) One to One (with SQL transform,
with Lookup)
d) One row to Many Rows (unpivot)
e) Many rows to array in one
column
f) Aggregations
ML Feedback Loops
Full cycle of model
deployment, tying
actions taken back
into model PROVIDE FULL CYCLE
OF DATA FROM
PRESENTATION, USER
INTERACTION, RESULT
S
ALLOWS MONITORING
AND TUNING OF ML
MODELS
PROVIDES METRICS
FOR ROADMAP
PRIORITIZATION
PROVIDES METRICS
FOR A/B TESTING
Tying It All Together
▪ Data flows from various products
▪ Ingested
▪ Transformed via Configuration Driven Pipelines
▪ One Common Data Model
▪ Data flows out of common data models back into ecosystem
▪ Baked in feedback loops
Democratized Data
▪ Data Scientists can access data directly from the CDM
▪ CDM is a Delta table
▪ Views are created for security access (no access to PII)
▪ Access is controlled at the view level
▪ Data is projected (using Schema on Read) to any destination location
(blob, SQL Server, Cosmos, etc)
▪ Data Scientists and Engineers can request any dataset they need by specifying metadata
▪ Requested data is transformed based on the metadata description
▪ Data is streamed or batched out to destination based on metadata frequency info
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Democratizing Data Architecture for AI

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Straight Talk to Demystify Data Lineage
Straight Talk to Demystify Data LineageStraight Talk to Demystify Data Lineage
Straight Talk to Demystify Data LineageDATAVERSITY
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

Was ist angesagt? (20)

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Straight Talk to Demystify Data Lineage
Straight Talk to Demystify Data LineageStraight Talk to Demystify Data Lineage
Straight Talk to Demystify Data Lineage
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 

Ähnlich wie Democratizing Data Architecture for AI

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationDenodo
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHostedbyConfluent
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationEric Kavanagh
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...confluent
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Denodo
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 

Ähnlich wie Democratizing Data Architecture for AI (20)

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
 
L17 Data Source Layer
L17 Data Source LayerL17 Data Source Layer
L17 Data Source Layer
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Kürzlich hochgeladen (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 

Democratizing Data Architecture for AI

  • 1.
  • 2. Democratizing Data Architecting Terabyte Common Data Models and Configuration Driven Pipelines For AI Platforms Cindy Mottershead, AI Architect, Blackbaud Shiran Algai, Senior Manager of Software Development, Blackbaud
  • 3. Agenda Shiran Algai ▪ Problem Statement ▪ Architecture Journey Cindy Mottershead ▪ Architecture decisions ▪ Common Data Model ▪ Configuration Driven Pipeline ▪ Transformation building blocks ▪ AI Feedback Loop
  • 4. We are the world’s leading cloud software company powering social good. Millions of users in over 100 countries The world’s 18th largest SaaS applications provider* Fortune 56 Companies Changing the World* *2017
  • 5. Problem ▪ Data is very siloed ▪ Similar entities are described entirely differently by every product ▪ Bringing on new sources continues to compound the issue ▪ Data is frequently entered slightly differently for same entity ▪ Engineering teams are unable to leverage data to drive insight and help our customers solve their problems. ▪ AI ETL cycle far too long ▪ AI ability to explore data extremely limited
  • 6. First Steps ▪ Had beginnings of a few data lake projects, but scattered a bit throughout organization ▪ Built consensus and momentum toward a common delta lake ▪ Started on MS tooling in Azure (data factory, U-SQL run by data lake analytics jobs, etc.) ▪ Leverage as many Azure PaaS tools as possible ▪ Batch only ▪ Picked a small "bore hole through the mountain" approach
  • 8. Pivoting ▪ Painful adding new readers for different sources not natively supported (Avro, Parquet) ▪ Gaps in Azure data tooling for our specific use cases ▪ Desire for batch AND streaming through a similar path ▪ Need the ability to compact records, recreating legacy datasets in the platform ▪ Ability to hire data engineers in the market easily
  • 10. Data Platform Ecosystem ▪ Delta Lake ▪ Azure Data Lake Store ▪ Data Catalog Service ▪ Lake Authorization Service ▪ Ingestion Service ▪ Output service ▪ Async messaging contract broker service
  • 11. Service A Service B Data Catalog Uses ACB as a source for new catalog entries Async Contract Broker Service Stores message schemas Prevents breaking schema changes Ingestion Service Automatically subscribes to new and existing topics … 82 more Lake Staging Zone Raw Zone Compacted daily Trusted Zone CDM tables Service Bus Topic
  • 12. Common Data Models ▪ Downstream services + Data Scientists all leverage same common models, accelerating development ▪ Common defined structure ▪ Consistent Naming of tables, structures, fields ▪ Consistent across all applications and application types ▪ Manage multiple data sources ▪ Remove complexities & specifics of source systems ▪ Shows the data “As is” (natural values) ▪ Provides common groupings & coding of data values (derived values) ▪ Integrated with Value-Added Services
  • 14. Common Data Model Input ▪ Thousands of relational tables ▪ Csv, json, parquet, avro, etc formatted input files ▪ Normalized and denormalized input ▪ Nested objects ▪ SQL Server, Mariadb, Oracle, flat files ▪ Change events
  • 15. Configuration Driven Pipeline ▪ Common Id ▪ Metadata Map ▪ Pipeline ▪ Transformations
  • 16. Transformation building blocks a) Filters b) View c) One to One (with SQL transform, with Lookup) d) One row to Many Rows (unpivot) e) Many rows to array in one column f) Aggregations
  • 17. ML Feedback Loops Full cycle of model deployment, tying actions taken back into model PROVIDE FULL CYCLE OF DATA FROM PRESENTATION, USER INTERACTION, RESULT S ALLOWS MONITORING AND TUNING OF ML MODELS PROVIDES METRICS FOR ROADMAP PRIORITIZATION PROVIDES METRICS FOR A/B TESTING
  • 18. Tying It All Together ▪ Data flows from various products ▪ Ingested ▪ Transformed via Configuration Driven Pipelines ▪ One Common Data Model ▪ Data flows out of common data models back into ecosystem ▪ Baked in feedback loops
  • 19. Democratized Data ▪ Data Scientists can access data directly from the CDM ▪ CDM is a Delta table ▪ Views are created for security access (no access to PII) ▪ Access is controlled at the view level ▪ Data is projected (using Schema on Read) to any destination location (blob, SQL Server, Cosmos, etc) ▪ Data Scientists and Engineers can request any dataset they need by specifying metadata ▪ Requested data is transformed based on the metadata description ▪ Data is streamed or batched out to destination based on metadata frequency info
  • 20. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.