SlideShare a Scribd company logo
1 of 31
Pulsar Virtual Summit Europe 2021
Pulsar in the Lakehouse
Ryan Zhu
Staff Software Engineer, Databricks
Addison Higham
Chief Architect, StreamNative
Pulsar Virtual Summit Europe 2021
Ryan Zhu
Staff Software Engineer
Ryan Zhu, Staff Software Engineer at Databricks
● Tech Lead of Delta Ecosystem team
● Apache Spark PMC member and commiter
● Experience:
○ One of the core developers of Delta Lake and Spark Structured
Streaming. Working on these two projects since the beginning.
○ Working on Delta Sharing, a new open protocol to share data recently.
Pulsar Virtual Summit Europe 2021
Addison Higham
Chief Architect
Addison Higham, Chief Architect at StreamNative
● Apache Pulsar Committer
● Experience:
○ 10+ years as Software Engineer, with 7 years working on streaming
systems.
○ 3+ years Pulsar experience, including leading the successful adoption
of Pulsar at Instructure.
Pulsar Virtual Summit Europe 2021
Delta Lake/Lakehouse Overview
Pulsar Virtual Summit Europe 2021
Data is
fragmented
across many
systems
Cost and
complexity is a
drag on the
organization
Silos get in the
way of data team
collaboration
Pulsar Virtual Summit Europe 2021
Data infrastructure is too complicated
Data Lake
Semi-structured
Data Warehouse
Structured
Machine
Learning
Data
Science
BI
Unstructured
Data Warehouse
BI
Data Warehouse
BI
Pulsar Virtual Summit Europe 2021
Pros
Great for
Business
Intelligence (BI)
applications
Cons
Limited support
for Machine
Learning (ML)
workloads
Proprietary
systems with
only a SQL
interface
Pros
Supports ML
Completely open
ecosystem of
tools and
formats
Cons
Poor support BI
Complex to
manage and
govern →data
swamp
Data
Warehouse
Data
Lake
Pulsar Virtual Summit Europe 2021
Lakehouse
One platform to unify all
your data, analytics, and AI workloads
BI & SQL
Open Data Lake
Data Management & Governance
Real-time Data
Applications
Data Science
& ML
Pulsar Virtual Summit Europe 2021
QUALITY
Filtered, Cleaned,
Augmented
Business-level
Aggregates
Raw Ingestion
and History
Building the foundation of a Lakehouse - Delta
Lake
CSV,
JSON, TXT…
Kinesis
BI &
Reporting
Streaming
Analytics
Data Science
& ML
BRONZE SILVER GOLD
Pulsar Virtual Summit Europe 2021
350+ PB
processed /
day
75%
Data Scanned
3K+
Customers in
Production
Pulsar Virtual Summit Europe 2021
OSS Delta Lake Key Features
Feature
ACID Transactions Delta Lake brings ACID transactions to your data lakes. It provides serializability, the
strongest level of isolation level. Learn more at Diving into Delta Lake: Unpacking the
Transaction Log.
Scalable Metadata Handling Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease.
Time Travel (data versioning) Delta Lake provides data snapshots to access and revert to earlier versions of data for audits,
rollbacks or to reproduce experiments.
Open Format All data in Delta Lake is stored in Apache Parquet format enabling Delta Lake to leverage the
efficient compression and encoding schemes that are native to Parquet
Pulsar Virtual Summit Europe 2021
OSS Delta Lake Key Features (Continued)
Feature
Unified Batch and Streaming
Source and Sink
A table in Delta Lake is both a batch table, as well as a streaming source and sink. Streaming
data ingest, batch historic backfill, and interactive queries all just work out of the box.
Schema Enforcement and
Evolution
Delta Lake provides the ability to specify your schema and enforce it. This helps ensure that the
data types are correct and required columns are present, preventing bad data from causing data
corruption. For more information, refer to Diving Into Delta Lake: Schema Enforcement &
Evolution.
Audit History Delta Lake transaction log records details about every change made to data providing a full audit
trail of the changes.
DML Operations Delta Lake supports SQL, Scala / Java and Python APIs to merge, update and delete datasets
allowing you to easily comply with GDPR and CCPA and simplifying use cases like change data
capture. For more information, refer to Diving Into Delta Lake: DML Internals
Pulsar Virtual Summit Europe 2021
Upcoming features
Feature
Column dropping and renaming Allow users to drop a column and rename a column.
Atomic data replacement Allow users to delete a portion of data from the table and replace it with new data atomically.
Schema evolution improvement
for MERGE
StructType in ArrayType will support schema evolutions in the MERGE command.
MERGE support for generated
columns
Generate Columns is a feature added in Delta 1.0 to support generating columns based on SQL
expressions. MERGE will support these columns.
New release cadence One release every 3 months
Pulsar Virtual Summit Europe 2021
Ecosystem Project Status
Delta Standalone Reader
Delta Standalone Writer
Available
Q4’ 21
Flink/Delta Source
Flink/Delta Sink
Q1’ 22
Q4’ 21
Pulsar/Delta Source
Pulsar/Delta Sink
Q4’ 21
Q1’ 22
PrestoDB/Trino integration Q4’ 21
Rust Integration
(kafka-delta-ingest)
Available
Nessie Integration Q4’ 21
LakeFS Integration Q4’ 21
Hive3 Connector Available
Spark 3.2 Support Q4’ 21
Delta Lake ecosystem
Pulsar Virtual Summit Europe 2021
Pulsar + Lakehouse
Pulsar Virtual Summit Europe 2021
Pulsar is the unified messaging and
streaming platform for real-time teams
Pulsar Virtual Summit Europe 2021
Why Pulsar?
Streams and
messages to
support more
workloads
Multi-tenancy to
break down data
silos and ease
data ingestion
Geo-replication
to support multi-
cloud and global
business
Pulsar Virtual Summit Europe 2021
Pulsar + Delta Lake enable data unification
Delta Lake and Lakehouse
support unified system for data,
analytics, and ML
Pulsar unifies real-time data across
diverse use cases like streaming,
messaging, and microservices
Simplifies data
infrastructures across
your entire organization
Pulsar
Delta Lake + =
Pulsar Virtual Summit Europe 2021
The Pulsar and Spark/Delta Lake communities are committed to building solid
integrations
Pulsar, Delta Lake, and Spark Connectors
Connector
Spark Pulsar Connector Connectors for Spark for reading and writing data from Pulsar for use with DataFrame and
DataStream APIs. https://github.com/streamnative/pulsar-spark. Discussions in progress for
upstream contribution.
Pulsar IO Delta Lake Source A Pulsar “Source” for reading data directly from Delta Lake within the Pulsar IO framework. It’s
built on top of Delta Standalone project. In progress, expect a first release this year.
Pulsar Virtual Summit Europe 2021
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar Virtual Summit Europe 2021
Pulsar offers many options for integration, including Pulsar, KoP, AoP,
connectors, to connect with many systems in real-time.
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar Virtual Summit Europe 2021
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Delta Lake Connectors allow for data to be exchanged between Delta
Lake and Pulsar.
Pulsar Virtual Summit Europe 2021
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Spark’s Pulsar connector allows for developers to write Spark jobs that
can read data from Pulsar topics, transform the data, and write back to
Pulsar topics.
Pulsar Virtual Summit Europe 2021
Application events stored in Delta Lake for use in ML
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar
Source
Pulsar
Source
Pulsar Virtual Summit Europe 2021
ML Results made available to applications
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar
Source
Pulsar
Source
Pulsar Virtual Summit Europe 2021
CDC Events transformed and stored in Delta Lake
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar Virtual Summit Europe 2021
Other systems data made available in Delta Lake for Data Science
Database
Pulsar IO
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar
Source
Pulsar
Source
Pulsar
---
KoP
---
AoP
---
Websocke
t
---
HTTP
Pulsar IO
Pulsar
Source
Pulsar Virtual Summit Europe 2021
Pulsar IO Delta Lake Source
With the Pulsar IO Delta Lake source, users will be able to ingest Delta Lake changes
into Pulsar without running a separate component
Delta Lake Source
or
Metadata
Change
Topic
W/
Schema
New File
Removed
File
Metadata
Change
Parquet
File
Update
Schema
Records
Write
Records
Pulsar Virtual Summit Europe 2021
Future of Pulsar + Delta Lake
One of Pulsar’s unique features is tiered storage, which allows for streams to be
offloaded out of Apache BookKeeper into S3, GCS, etc.
Work is in progress to offload data in Delta Lake compatible files, with the required
metadata, allowing for Pulsar to make streams available to Delta Lake without any
need to copy data out of Pulsar and allows for the data to still be read as streams.
Stay connected to learn more in early 2022!
Pulsar Virtual Summit Europe 2021
Pulsar and Delta Lake are technologies
designed to simplify your data
infrastructure
Connect with us on #connector-pulsar in
Delta Lake Slack to learn more!
Pulsar Virtual Summit Europe 2021
Thank-You!

More Related Content

What's hot

An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheusKasper Nissen
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA EDB
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Edureka!
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막
(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막
(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막Jay Park
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 

What's hot (20)

An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
 
Advanced Terraform
Advanced TerraformAdvanced Terraform
Advanced Terraform
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막
(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막
(알도개) GraalVM – 자바를 넘어선 새로운 시작의 서막
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 

Similar to Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - Pulsar Summit Europe 2021 Keynote

Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Paulo Gutierrez
 
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...StreamNative
 
Leverage Kafka to build a stream processing platform
Leverage Kafka to build a stream processing platformLeverage Kafka to build a stream processing platform
Leverage Kafka to build a stream processing platformconfluent
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019Juan Fabian
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021StreamNative
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Lightbend
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 

Similar to Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - Pulsar Summit Europe 2021 Keynote (20)

Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
What Is Delta Lake ???
What Is Delta Lake ???What Is Delta Lake ???
What Is Delta Lake ???
 
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
 
Leverage Kafka to build a stream processing platform
Leverage Kafka to build a stream processing platformLeverage Kafka to build a stream processing platform
Leverage Kafka to build a stream processing platform
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 

More from StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - Pulsar Summit Europe 2021 Keynote

  • 1. Pulsar Virtual Summit Europe 2021 Pulsar in the Lakehouse Ryan Zhu Staff Software Engineer, Databricks Addison Higham Chief Architect, StreamNative
  • 2. Pulsar Virtual Summit Europe 2021 Ryan Zhu Staff Software Engineer Ryan Zhu, Staff Software Engineer at Databricks ● Tech Lead of Delta Ecosystem team ● Apache Spark PMC member and commiter ● Experience: ○ One of the core developers of Delta Lake and Spark Structured Streaming. Working on these two projects since the beginning. ○ Working on Delta Sharing, a new open protocol to share data recently.
  • 3. Pulsar Virtual Summit Europe 2021 Addison Higham Chief Architect Addison Higham, Chief Architect at StreamNative ● Apache Pulsar Committer ● Experience: ○ 10+ years as Software Engineer, with 7 years working on streaming systems. ○ 3+ years Pulsar experience, including leading the successful adoption of Pulsar at Instructure.
  • 4. Pulsar Virtual Summit Europe 2021 Delta Lake/Lakehouse Overview
  • 5. Pulsar Virtual Summit Europe 2021 Data is fragmented across many systems Cost and complexity is a drag on the organization Silos get in the way of data team collaboration
  • 6. Pulsar Virtual Summit Europe 2021 Data infrastructure is too complicated Data Lake Semi-structured Data Warehouse Structured Machine Learning Data Science BI Unstructured Data Warehouse BI Data Warehouse BI
  • 7. Pulsar Virtual Summit Europe 2021 Pros Great for Business Intelligence (BI) applications Cons Limited support for Machine Learning (ML) workloads Proprietary systems with only a SQL interface Pros Supports ML Completely open ecosystem of tools and formats Cons Poor support BI Complex to manage and govern →data swamp Data Warehouse Data Lake
  • 8. Pulsar Virtual Summit Europe 2021 Lakehouse One platform to unify all your data, analytics, and AI workloads BI & SQL Open Data Lake Data Management & Governance Real-time Data Applications Data Science & ML
  • 9. Pulsar Virtual Summit Europe 2021 QUALITY Filtered, Cleaned, Augmented Business-level Aggregates Raw Ingestion and History Building the foundation of a Lakehouse - Delta Lake CSV, JSON, TXT… Kinesis BI & Reporting Streaming Analytics Data Science & ML BRONZE SILVER GOLD
  • 10. Pulsar Virtual Summit Europe 2021 350+ PB processed / day 75% Data Scanned 3K+ Customers in Production
  • 11. Pulsar Virtual Summit Europe 2021 OSS Delta Lake Key Features Feature ACID Transactions Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level. Learn more at Diving into Delta Lake: Unpacking the Transaction Log. Scalable Metadata Handling Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease. Time Travel (data versioning) Delta Lake provides data snapshots to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments. Open Format All data in Delta Lake is stored in Apache Parquet format enabling Delta Lake to leverage the efficient compression and encoding schemes that are native to Parquet
  • 12. Pulsar Virtual Summit Europe 2021 OSS Delta Lake Key Features (Continued) Feature Unified Batch and Streaming Source and Sink A table in Delta Lake is both a batch table, as well as a streaming source and sink. Streaming data ingest, batch historic backfill, and interactive queries all just work out of the box. Schema Enforcement and Evolution Delta Lake provides the ability to specify your schema and enforce it. This helps ensure that the data types are correct and required columns are present, preventing bad data from causing data corruption. For more information, refer to Diving Into Delta Lake: Schema Enforcement & Evolution. Audit History Delta Lake transaction log records details about every change made to data providing a full audit trail of the changes. DML Operations Delta Lake supports SQL, Scala / Java and Python APIs to merge, update and delete datasets allowing you to easily comply with GDPR and CCPA and simplifying use cases like change data capture. For more information, refer to Diving Into Delta Lake: DML Internals
  • 13. Pulsar Virtual Summit Europe 2021 Upcoming features Feature Column dropping and renaming Allow users to drop a column and rename a column. Atomic data replacement Allow users to delete a portion of data from the table and replace it with new data atomically. Schema evolution improvement for MERGE StructType in ArrayType will support schema evolutions in the MERGE command. MERGE support for generated columns Generate Columns is a feature added in Delta 1.0 to support generating columns based on SQL expressions. MERGE will support these columns. New release cadence One release every 3 months
  • 14. Pulsar Virtual Summit Europe 2021 Ecosystem Project Status Delta Standalone Reader Delta Standalone Writer Available Q4’ 21 Flink/Delta Source Flink/Delta Sink Q1’ 22 Q4’ 21 Pulsar/Delta Source Pulsar/Delta Sink Q4’ 21 Q1’ 22 PrestoDB/Trino integration Q4’ 21 Rust Integration (kafka-delta-ingest) Available Nessie Integration Q4’ 21 LakeFS Integration Q4’ 21 Hive3 Connector Available Spark 3.2 Support Q4’ 21 Delta Lake ecosystem
  • 15. Pulsar Virtual Summit Europe 2021 Pulsar + Lakehouse
  • 16. Pulsar Virtual Summit Europe 2021 Pulsar is the unified messaging and streaming platform for real-time teams
  • 17. Pulsar Virtual Summit Europe 2021 Why Pulsar? Streams and messages to support more workloads Multi-tenancy to break down data silos and ease data ingestion Geo-replication to support multi- cloud and global business
  • 18. Pulsar Virtual Summit Europe 2021 Pulsar + Delta Lake enable data unification Delta Lake and Lakehouse support unified system for data, analytics, and ML Pulsar unifies real-time data across diverse use cases like streaming, messaging, and microservices Simplifies data infrastructures across your entire organization Pulsar Delta Lake + =
  • 19. Pulsar Virtual Summit Europe 2021 The Pulsar and Spark/Delta Lake communities are committed to building solid integrations Pulsar, Delta Lake, and Spark Connectors Connector Spark Pulsar Connector Connectors for Spark for reading and writing data from Pulsar for use with DataFrame and DataStream APIs. https://github.com/streamnative/pulsar-spark. Discussions in progress for upstream contribution. Pulsar IO Delta Lake Source A Pulsar “Source” for reading data directly from Delta Lake within the Pulsar IO framework. It’s built on top of Delta Standalone project. In progress, expect a first release this year.
  • 20. Pulsar Virtual Summit Europe 2021 Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP
  • 21. Pulsar Virtual Summit Europe 2021 Pulsar offers many options for integration, including Pulsar, KoP, AoP, connectors, to connect with many systems in real-time. Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP
  • 22. Pulsar Virtual Summit Europe 2021 Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Delta Lake Connectors allow for data to be exchanged between Delta Lake and Pulsar.
  • 23. Pulsar Virtual Summit Europe 2021 Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Spark’s Pulsar connector allows for developers to write Spark jobs that can read data from Pulsar topics, transform the data, and write back to Pulsar topics.
  • 24. Pulsar Virtual Summit Europe 2021 Application events stored in Delta Lake for use in ML Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Pulsar Source Pulsar Source
  • 25. Pulsar Virtual Summit Europe 2021 ML Results made available to applications Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Pulsar --- KoP --- AoP --- Websocke t --- HTTP Pulsar Source Pulsar Source
  • 26. Pulsar Virtual Summit Europe 2021 CDC Events transformed and stored in Delta Lake Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP
  • 27. Pulsar Virtual Summit Europe 2021 Other systems data made available in Delta Lake for Data Science Database Pulsar IO Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Pulsar Source Pulsar Source Pulsar --- KoP --- AoP --- Websocke t --- HTTP Pulsar IO Pulsar Source
  • 28. Pulsar Virtual Summit Europe 2021 Pulsar IO Delta Lake Source With the Pulsar IO Delta Lake source, users will be able to ingest Delta Lake changes into Pulsar without running a separate component Delta Lake Source or Metadata Change Topic W/ Schema New File Removed File Metadata Change Parquet File Update Schema Records Write Records
  • 29. Pulsar Virtual Summit Europe 2021 Future of Pulsar + Delta Lake One of Pulsar’s unique features is tiered storage, which allows for streams to be offloaded out of Apache BookKeeper into S3, GCS, etc. Work is in progress to offload data in Delta Lake compatible files, with the required metadata, allowing for Pulsar to make streams available to Delta Lake without any need to copy data out of Pulsar and allows for the data to still be read as streams. Stay connected to learn more in early 2022!
  • 30. Pulsar Virtual Summit Europe 2021 Pulsar and Delta Lake are technologies designed to simplify your data infrastructure Connect with us on #connector-pulsar in Delta Lake Slack to learn more!
  • 31. Pulsar Virtual Summit Europe 2021 Thank-You!

Editor's Notes

  1. Today I’m pretty excited to have the chance to talk about Delta Lake and Lakehouse.
  2. Before we jump into Lakehouse, I would like to talk about the challenges people are facing today. Data infrastructure is too complicated and expensive to manage for advanced use cases. Many systems are not open. They are using proprietary formats, and you need to copy data around multiple systems. Teams are not well connected to collaborate.
  3. This is one classic architecture example. The massive majority of data is flowing into a data lake. Companies do a lot of data validations so that they can serve data science and machine learning on top of these data lakes. At the same time, a huge amount of data is ETLed to many downstream data warehouses to do business intelligence and other use cases. We have to do that because the BI workloads are often too slow to run on a data lake directly. Depend on the workload, data also needs to be moved out of data warehouse back to data lake if it’s been updated in the data warehouse. Increasingly, the machine learning workloads are also reading and writing to the data warehouses at the same time.
  4. The root of the problem is inherent difference between data lakes and data warehouses. On one hand, we have data lakes that do a great job supporting machine learning. They have open formats and a big ecosystem on top of them. But, they have poor support for business intelligence and suffer complex data quality problems. [CLICK] On the other hand, we have data warehouses that are great for business intelligence applications. But, they have limited support for machine learning workloads, and they are proprietary systems with only a SQL interface.
  5. Unifying these systems can be transformational in how we think about data. This is why we’re such big believers in the lakehouse to provide one platform to unify all of your data, analytics, and AI to allow all members of the data team to collaborate together. By definition, a lakehouse is based on open standards and open source. Because without being open, it’s impossible to create the unification across all data types, all these tools, and workloads. And, the communities for the best open source projects to enable the lakehouse are in the audience today across Apache Spark, Delta Lake, MLflow, and Redash.
  6. The foundation of your lakehouse is Delta Lake. Delta Lake allows you to build a data quality framework for your data lake by ensuring the data is reliable via ACID transactions. In this model, data is flowing into Delta Lake from all various of data sources. The data quality is improved incrementally, from raw data, intermediate data, clean data. Then downstream applications such as machine learning, AI and BI can build on top of the clean, fresh and reliable data to fit the use cases it’s intended for. Delta Lake is foundation of this model, an open source project that enables building a lakehouse architecture on top of existing storage systems such as S3, ADLS, GCS, and HDFS.
  7. Today Delta Lake is used all over the world. Over 350+ PB data per day gets processed on Delta Lake. 70% data scanned on Databricks platform is using Delta Lake. And Dalte Lake has been deployed to over 3 thousands customers in their production lakehouse architecture.
  8. So what key features does Delta Lake provide to help you build your lakehouse? Delta Lake provides ACID transactions, scalable metadata handling to process tables with billions of files, data versioning which provides the ability to time travel. Data is stored in open format to allow users to leverage existing tools.
  9. In addition, Delta Lake unifies batch and streaming queries, makes it easy to write either batch or streaming applications. Delta Lake automatically handles schema validation to prevent bad records from causing data corruption, and supports schema evolution. It allows you to audit history through transaction logs. Delta Lake also supports DML operations such as update, delete and merge.
  10. We also continue to improve Delta Lake for more use cases and there are multiple excited features coming soon. Delta Lake will allow users to drop a column or rename a column. You will be able to replace a portion of data in your table with new data atomically. MERGE command will get better schema evolution support and generated column support. We also plan to make a release every 3 months so that people can get new Delta Lake features quickly.
  11. Moreover, the Delta Lake community is interested and excited to expand Delta Lake and make it available everywhere. We are building the Delta Standalone project to allow users to read and write Delta Lake natively. This enables the community to build a lot of connectors for various of systems, such as Pulsar, Flink, Presto. Next, I will hand over to Addison to talk about the work of Pulsar connector for Delta Lake.