NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289817171/
We are excited to invite you to an in-person meetup, that is all about streaming data!
If you are interested in Apache Pinot, Apache Pulsar, Apache Flink, and Apache NiFi, this event is for you!
AGENDA
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
7:15 - 8:00 PM EST: Introduction to Real-Time Analytics with Apache Pinot: David G. Simmons @ StarTree
7:15 - 8:00 PM EST: Building Real-Time Requires a Team: Tim Spann, Developer Advocate @ StreamNative
8:00 - 8:30 PM EST: Round Table
8:30 - 9:00 PM EST: Q&A + Networking
----
“Building Real-Time Requires a Team”- Tim Spann, Developer Advocate @ StreamNative
This talk will discuss building real-time streaming applications utilizing the best open-source systems. This FLiPPN Stack consisting of Apache Flink, Apache Pulsar, Apache Pinot, and Apache NiFi supercharges any real-time app or pipeline building. I will walk you through the what, why, how, and where to use these amazing tools. We will walk through some demos and dive into the source. At the end, we will have driven real-time events into Apache Pinot, setting the stage for your real-time user-facing insights.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal, and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit, and many more.
Pronouns: He, They
https://github.com/tspannhw/SpeakerProfile
“Introduction to Real-Time Analytics with Apache Pinot" - David G. Simmons, Head of Developer Advocacy @ StarTree
https://notist.davidgs.com/ https://notist.davidgs.com/
We will simulcast: https://streamnative.zoom.us/meeting/register/tZcpf-iurTojHtTJjwhi87e_iKJJYhrONpAG
2. Proprietary & Confidential | 2
Tim Spann
Developer Advocate
at StreamNative
FLiP(N) Stack = Flink, Pulsar and NiFi Stack
Streaming Systems & Data Architecture Expert
Experience:
● 15+ years of experience with streaming technologies
including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet,
IoT, Python and more.
● Today, he helps to grow the Pulsar community sharing rich
technical knowledge and experience at both global
conferences and through individual conversations.
3. Proprietary & Confidential |
https://bit.ly/32dAJft
3
FLiP Stack Weekly
This week in Apache Flink, Apache
Pulsar, Apache NiFi, Apache Spark and
open source friends.
7. Proprietary & Confidential |
IoT
● Predictive maintenance
● Track and trace
● Connected supply chain
● Geo-location based alerts
Telecommunications
● Network optimization
● Churn prevention
● Real-time in-service
promos & discounting
Data Lake
● Data pipeline acceleration
● Real-time analytics
● Real-time decisioning
Verticals and use cases
7
8. A streaming data platform
for cloud-native,
event-driven applications.
9. Proprietary & Confidential | 9
Founded by the original creators of
Apache Pulsar.
StreamNative employs more than 50% of
the active core committers to Apache
Pulsar.
StreamNative has more experience
designing, deploying, and running
large-scale Apache Pulsar instances
than any team in the world.
10. 10
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
12. Proprietary & Confidential | 12
Pulsar Cluster
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for
both Pulsar and
BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata
Storage
14. Proprietary & Confidential |
Messages - the basic unit of Pulsar
14
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
15. Proprietary & Confidential |
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
15
16. Proprietary & Confidential |
Integrated Schema Registry
16
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
23. Proprietary & Confidential | 23
A serverless event streaming
framework
Pulsar Functions
● Lightweight computation similar to
AWS Lambda.
● Specifically designed to use Apache
Pulsar as a message bus.
● Function runtime can be located
within Pulsar Broker.
● Java Functions
24. Proprietary & Confidential | 24
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries to
support the execution of ML
models on the edge.