Big mountain data and dev conference apache pulsar with mqtt for edge computing
1. Apache Pulsar with MQTT for
Edge Computing
Timothy Spann | Developer Advocate
Big Mountain Data and Dev Conference
2. Tim Spann
Developer Advocate
● https://www.datainmotion.dev/
● https://github.com/tspannhw/SpeakerProfile
● https://dev.to/tspannhw
● https://sessionize.com/tspann/
DZone Zone Leader and Big Data
MVB Data DJay
3. Founded by the original developers of
Apache Pulsar and Apache BookKeeper,
StreamNative builds a cloud-native event
streaming platform that enables
enterprises to easily access data as
real-time event streams.
9. Pulsar is built for easy scale-out.
*Illustrations by Jack
Vanlightly
10. Key Milestones
2012 2016 2017 2018 2019 2020
Originally developed
inside Yahoo! as “Cloud
Messaging Service”
Pulsar is
committed to
Open Source
Pulsar is accepted into
the Apache Software
Foundation
Pulsar
becomes a
Top-Level
Project
● StreamNative is founded and
seed round raised.
● Tencent adopts Pulsar for
payment processing platform.
● BestPay adopts Pulsar for
payment processing.
● Pulsar hits 200 contributors.
● 2 global Pulsar conferences, 80+ speakers, 1,500+ attendees
● Pulsar hits 340 contributors
● StreamNative and OVHCloud launch Kafka on Pulsar (KoP)
● StreamNative + China Mobile launch AMQP on Pulsar (AoP)
● Pulsar Ecosystem expands - StreamNative Hub launches
● StreamNative Cloud launches on GCP and Alibaba Cloud
● StreamNative customer adoption continues - new
customers include Flipkart and Applied Materials
● Pulsar 2.7 + Transactions
● Pulsar Flink Connector 2.7
Major increase in adoption following
TLP designation in 2018
2021
● 3 global Pulsar conferences
● StreamNative hits 400
contributors (June).
● Pulsar surpasses Kafka in
monthly active contributors.
● Pulsar 2.8 + Exactly-Once
semantics
● StreamNative Platform launches
11. Apache Pulsar Overview
Enable Geo-Replicated Messaging
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
12. Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named
channel that producers use to
transmit messages to subscribed
consumers.
● Messages belong to a topic and
contain an arbitrary payload.
● Brokers handle connections and
routes messages between producers
/ consumers.
● Subscriptions are named
configuration rules that determine
how messages are delivered to
consumers.
● Consumers receive messages.
13. What is the Pulsar Ecosystem?
● Functions and Connectors
○ Functions: Lightweight stream processing
○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink”
APIs
■ Files, Databases, Data tools, Cloud Services, etc
● Protocol Handlers
○ Allows Pulsar to handle additional protocols by an extendable
API running in the broker
■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
15. Pulsar subscription modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
16. Reader and Batch
API
Pulsar
IO/Connectors
Stream Processor
Applications
Prebuilt Connectors Custom Connectors
Microservices or
Event-Driven Architecture
Pub/Sub
API
Publisher
Subscriber
Admin
API
Operators &
Administrators
Teams
Tenant
Pulsar API
Design
17. What is the Pulsar Ecosystem? (cont’d)
● Processing Engines
○ Supports modern processing engines
■ Flink and Spark, as well as Pulsar SQL (Presto/Trino)
● Offloaders
○ Allows data to be offloaded to cloud storage and used with
existing Pulsar APIs
■ S3, GCP Cloud Storage, HDFS, File (NFS), Azure Blob Storage
(in Pulsar 2.7.0)
21. Pulsar Functions
Provides a simple API to:
● Receive a message (consume)
● Process the message using your own code
● Send a message (produce)
Takes care of the boilerplate code so there is no need to create
producers and consumers.
22. Moving Data In and Out of Pulsar
IO/Connectors are a simple way to integrate with external systems and move data
in and out of Pulsar.
● Built on top of Pulsar Functions
● Built-in connectors - hub.streamnative.io
Source Sink
23. Use Azure BlobStore offloader with
Pulsar
https://pulsar.apache.org/docs/en/tiered-storage-azure/
30. Powered by Apache Pulsar, StreamNative provides a cloud-native,
real-time messaging and streaming platform to support multi-cloud
and hybrid cloud strategies.
Built for Containers
Cloud Native
StreamNative Cloud
Flink SQL
40. Connect with the Community & Stay Up-To-Date
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Subscribe to Monthly Pulsar Newsletter for major news, events,
project updates, and resources in the Pulsar community
42. Interested In Learning More?
Flink SQL Cookbook
The Github Source for Flink
SQL Demo
The GitHub Source for Demo
Manning's Apache Pulsar in
Action
O’Reilly Book
[10/21] Trino Summit
Resources Free eBooks Upcoming Events